diff --git a/apps/SplunkAdmins/LICENSE b/apps/SplunkAdmins/LICENSE new file mode 100644 index 00000000..8dada3ed --- /dev/null +++ b/apps/SplunkAdmins/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/apps/SplunkAdmins/NOTICE b/apps/SplunkAdmins/NOTICE new file mode 100644 index 00000000..09d9591d --- /dev/null +++ b/apps/SplunkAdmins/NOTICE @@ -0,0 +1,13 @@ +Copyright 2017 Gareth Anderson + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. diff --git a/apps/SplunkAdmins/README.md b/apps/SplunkAdmins/README.md new file mode 100644 index 00000000..34488334 --- /dev/null +++ b/apps/SplunkAdmins/README.md @@ -0,0 +1,2183 @@ +## SplunkBase +Also available on SplunkBase as [Alerts for Splunk Admins](https://splunkbase.splunk.com/app/3796/) +For some searches you will need the companion app, the TA-Alerts for Splunk Admins app [TA-Alerts for SplunkAdmins github](https://github.com/gjanders/TA-SplunkAdmins/) or [TA-Alerts for SplunkAdmins splunkbase](https://splunkbase.splunk.com/app/6518/) + +You may also be interested in [VersionControl For Splunk](https://splunkbase.splunk.com/app/4355/) or perhaps [Decrypt2](https://splunkbase.splunk.com/app/5565/) + +## Introduction +This application accompanies the Splunk conf 2017 presentation "How did you get so big? Tips and tricks for growing your Splunk installation from 50GB/day to 1TB/day" + +The overall idea behind this application is to provide a variety of alerts that detect issues or potential issues within the splunk log files and then advise via an alert that this has occurred +This application was built as there were a variety of messages in the Splunk console and logs in Splunk that if acted upon could have prevented an issue within the environment. + +The original presentation is available as a [recording](http://conf.splunk.com/files/2017/recordings/howd-you-get-so-big-tips-n-tricks-for-growing-your-splunk-deployment-from-50-gb-per-day-to-1-tb-per-day.mp4) or [PDF](http://conf.splunk.com/files/2017/slides/howd-you-get-so-big-tips-tricks-for-growing-your-splunk-deployment-from-50-gb-day-to-1-tb-day.pdf) +The powerpoint should it be required is available [here](https://github.com/gjanders/splunkconf2017) + +There are many potential alerts that might cause an issue so this application has all alerts disabled by default, post-installation once the required macros are configured you can enable the alerts you wish to use and add the required actions + +There are also dashboards for investigating indexer performance, heavy forwarder queue usage, data model acceleration issues among other items that may be of interest to a Splunk admin + +Please note that the all alerts & dashboards were tested on Linux-based Splunk infrastructure, with AIX, Linux and Windows forwarders + +If you are running your Splunk enterprise installation on Windows or have customised your installation directory you will need to customise some of the macros such as `splunkadmins_splunkd_source` to point to the correct splunkd log file location + +Also note that this application contains a very large number of alerts which you can use, you may wish to utilise the `allow_skew` in savedsearches.conf to allow the scheduler to balance out the scheduled alerts execution times + +Finally, the application has evolved over the years, more recent releases have resulted in very generic alerts such as `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only`, this is designed as a "catch all" to cover many splunkd log messages. The older alerts are very specific as the team I worked in was new to Splunk and required a more specific outcome/action based on each alert +Feel free to use either, and feedback or contributions via github or email are always welcome + +## Macros - required configuration +The various saved searches and dashboards use macros within their searches, you will need to update the macros to ensure the searches/dashboards work as expected +To check the contents of the macros in Splunk 7 or newer, use CTRL-SHFT-E within the search window + +The macros are listed below, many expect a `host=A OR host=B` item to assist in narrowing down a search while others expect only a single value...note that for `splunk_server` values they are always lower-case and case-sensitive! + +`indexerhosts - a host=...` list of your indexers (for example `host=indexer1 OR host=indexer2`) + +`heavyforwarderhosts - a host=...` list of your heavy forwarders (for example `host=heavyforwarder1 OR host=heavyforwarder2`) + +`searchheadhosts - a host=...` list of your search head(s) (for example `host=searchhead1 OR host=searchhead2`) + +`localsearchheadhosts - a host=...` list of your search head(s) within the cluster that these alerts are running on + +`splunkenterprisehosts - a host=...` list of any Splunk enterprise instance (for example `host=indexer1 OR host=searchhead1 OR ...`) + +`deploymentserverhosts - a host=...` list of deployment server(s) (for example `host=splunkdeploymentserver`) + +`licensemasterhost - a host=...` entry for the license master server (for example `host=splunklicensemaster`) + +`searchheadsplunkservers - a splunk_server=...` list of any Splunk search head hosts (for example `splunk_server=searchhead*`) + +`splunkindexerhostsvalue - a splunk_server=...` list of any Splunk indexer hosts (for example `splunk_server=indexer*`), or a `splunk_server_group=indexer_group` + +`splunkadmins_splunkd_source` - this defaults to `source=*splunkd.log`, for a slight improvement in performance you can make this a specific file such as `/opt/splunk/var/log/splunk/splunkd.log` + +`splunkadmins_splunkuf_source` - this defaults to `source=*splunkd.log`, you may wish to narrow down this location if your splunkd logs on universal forwarders have consistent installation directories + +`splunkadmins_mongo_source` - this defaults to `source=*mongod.log`, for a slight improvement in performance you can make this a specific file such as `/opt/splunk/var/log/splunk/mongod.log` + +`splunkadmins_clustermaster_oshost - a host=...` entry for the cluster master server (for example `host=splunkclustermaster`) + +The macros are used in various alerts which you can optionally enable, the alerts will raise a triggered alert only as emails are not allowed for Splunk app certification purposes +The macros are also used in the dashboards for this application + +There are also other macros you might want to consider editing before enabling the alerts, for example `splunkadmins_replicationfactor`. + +The vast majority of the alerts also have a macro(s) which you can customise to tweak the search results, for example the macro `splunkadmins_weekly_truncated` allows the alert, `IndexerLevel - Weekly Truncated Logs Report`, to be customised without changing the alert itself. This will make upgrading to a new version of this app more straightforward +I have attempted to provide an appropriate macro in any alert where I deemed it appropriate, feedback is welcome for any alert that you believe should have a macro or requires further improvement + +## Installation +The application is designed to work on a search head or search head cluster instance, installation on the indexing tier is not required. You may wish to use your monitoring console server as the search head to run this app on (as it will have `splunk_server_groups` configured for your environment). +There are a few searches that use REST API calls which are specific to the search head cluster they run on. These alerts will have to be placed on each search head or search head cluster, alternatively any server with the required search peers will also work, the relevant alerts are: +- `SearchHeadLevel - Accelerated DataModels with All Time Searching Enabled` +- `SearchHeadLevel - Realtime Scheduled Searches are in use` +- `SearchHeadLevel - Realtime Search Queries in dashboards` +- `SearchHeadLevel - Scheduled Searches without a configured earliest and latest time` +- `SearchHeadLevel - Scheduled searches not specifying an index` +- `SearchHeadLevel - Scheduled searches not specifying an index macro version` +- `SearchHeadLevel - Scheduled Searches Configured with incorrect sharing` +- `SearchHeadLevel - Saved Searches with privileged owners and excessive write perms` +- `SearchHeadLevel - User - Dashboards searching all indexes` +- `SearchHeadLevel - User - Dashboards searching all indexes macro version` +- `SearchHeadLevel - Users exceeding the disk quota (recent jobs list uses a REST call so you may need to adjust the search), the SearchHeadLevel - Users exceeding the disk quota introspection is a non-search head specific alternative` + +The following reports also are specific to a search head or search head cluster: +- `SearchHeadLevel - Alerts that have not fired an action in X days` +- `SearchHeadLevel - Data Model Acceleration Completion Status` +- `SearchHeadLevel - Macro report` +- `What Access Do I Have?` + +The following dashboards are search head or search head cluster specific: +- `Data Model Rebuild Monitor` +- `Data Model Status` + +The following reports / alert must either run on the cluster master or a server where the cluster master is a peer: +- `ClusterMasterLevel - Per index status` +- `ClusterMasterLevel - Primary bucket count per peer` + +## Dependencies +This application is designed to work independently of other Splunk applications, however there are a few reports and dashboards that rely on external apps to work as expected, these include: + +Dashboards: +- `splunk_forwarder_data_balance_tuning` - the ScatterPlot visualization relies on the Splunk MLTK (Machine Learning Toolkit), note the dashboard works without this as well + +Alerts/Reports: +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` +- `IndexerLevel - RemoteSearches Indexes Stats` +- `SearchHeadLevel - audit logs showing all time searches` +- `SearchHeadLevel - platform_stats access summary` +- `SearchHeadLevel - Script failures in the last day` +- `SearchHeadLevel - Search Queries summary exact match` +- `SearchHeadLevel - Search Queries summary non-exact match` +- `SearchHeadLevel - Searches dispatched as owner by other users` +- `SearchHeadLevel - Search Messages field extractor slow` +- `SearchHeadLevel - Search Messages user level` +- `SearchHeadLevel - Search Messages admins only` +- `SearchHeadLevel - SmartStore cache misses - dashboards` +- `SearchHeadLevel - SmartStore cache misses - savedsearches` += `SearchHeadLevel - SmartStore cache misses - combined` + +Will have more accurate search results with the base64 decoding working, [decrypt2 github](https://github.com/gjanders/decrypt2) or [decrypt2 SplunkBase](https://splunkbase.splunk.com/app/5565/) can work for this situation, you will need to update the macro `base64decode` in this app once decrypt2 is installed + +The following alerts/reports: +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` +- `SearchHeadLevel - Search Queries summary non-exact match` +- `SearchHeadLevel - Dashboards using depends and running searches in the background ` + +Require the [TA-Alerts for SplunkAdmins github](https://github.com/gjanders/TA-SplunkAdmins/) or [TA-Alerts for SplunkAdmins splunkbase](https://splunkbase.splunk.com/app/6518/) to work as sexpected + +## Using the application +Once the application is installed, **all** alerts are disabled by default and you can enable those you require or want to test in your local environment +If you choose not to customise the macros then many searches will search for all hosts, which will make the alerts and dashboards inaccurate! + +## Which alerts should be enabled? +The alerts are all useful for detecting a variety of different scenarios which may or may not be applicable within your Splunk environment, in many ways this application has evolved into a library of possible alerts or explanation of alerts, it does not make sense to turn on all the alerts as such as some overlap + +The description field has an (extremely) simple way of determining if an alert will require action, there are three levels: + - Low - the alert is informational and likely relates to a potential issue, these alerts may produce false alarms + - Moderate - the alert is a warning, most likely further action will need to be taken, a moderate chance of false alarms + - High - the alert is likely relating to something that requires action and there is a very low chance that this will create false alarms + +I do not have a nice way to auto-enable various alerts excluding editing the local/savedsearches.conf or via the GUI, any contribution of a setup file would be welcome here! + +## How is this application used? +In the current environment the vast majority of the alerts are enabled to detect issues, they raise automated tickets or email depending on the urgency of the specific alert. +There are a few environment characteristics that may require changes to the way the app is used, and feedback is welcome if there is a nicer way to structure the alerts/application +The overall assumption is that the admin(s) are not carefully watching the splunkd logs or the messages in the console of the monitoring server/Splunk servers + +## How is this application tested? +Before 2019 the universal forwarders in use are installed on a mix of Windows, Linux & AIX servers, in 2019 and beyond the testing scope has been vastly reduced to focus primarily on Splunk enterprise servers +All heavy forwarders, and Splunk enterprise installations are Linux based, while I expect the alerts will work with only changes to the macros.conf for a Windows based environment this remains untested +The test environment for this application has a single indexer cluster and two search head clusters + +## Why was this application and associated conf talk created? +Inspired by articles such as "Things I wish I knew then" and knowledge collected from various conference replays, SplunkAnswers, 200+ support tickets & nearly four years of working on a Splunk environment I decided that I would attempt to share what I have learned in an attempt to prevent others from repeating the same mistakes +There are many Splunk conf talks available on this subject in various conference replays, however my goal was to provide practical steps to implement the ideas. That is why this application exists + +## Which alerts are best suited to automation? +- `SearchHeadLevel - Scheduled searches not specifying an index` +- `SearchHeadLevel - Scheduled Searches Configured with incorrect sharing` +- `SearchHeadLevel - Splunk login attempts from users that do not have any LDAP roles` +- `SearchHeadLevel - Scheduled Searches That Cannot Run` +- `SearchHeadLevel - Scheduled Searches without a configured earliest and latest time` +- `SearchHeadLevel - Users exceeding the disk quota` +- `SearchHeadLevel - Users with auto-finalized searches` +- `SearchHeadLevel - User - Dashboards searching all indexes` +- `SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated` +- `SearchHeadLevel - Detect lookups that have not being accessed for a period of time` +- `SearchHeadLevel - WLM aborted searches` +- `SearchHeadLevel - Dashboards with all time searches set` +- `SearchHeadLevel - SavedSearches using special characters` +- `SearchHeadLevel - Dashboards using special characters` +- `SearchHeadLevel - Dashboards using depends and running searches in the background` +- `SearchHeadLevel - Summary searches using realtime search scheduling` +- `SearchHeadLevel - Searches dispatched as owner by other users` +- `SearchHeadLevel - Search Messages user level` +- `SearchHeadLevel - audit logs showing all time searches` +- `SearchHeadLevel - summary indexing searches not using durable search` + +Are all well suited to an automated email using the sendresults command or a similar function as they involve end user configuration which the individual can change/fix + +## Which alerts and reports have been tested on the newer Splunk versions such as 8.2 or 9.0? +This application was first created in 2017 and both Splunk and the application have evolved during this time period. This application is a library of potential alerts that could be used in a Splunk environment so it would never be a good idea to turn on all alerts from this application. + +The below list of alerts and reports are actively used since version 8.0.x and in 8.2.x and eventually 9.0.x: +- `AllSplunkEnterpriseLevel - error in stdout.log` +- `AllSplunkEnterpriseLevel - Email Sending Failures` +- `AllSplunkEnterpriseLevel - Losing Contact With Master Node` +- `AllSplunkEnterpriseLevel - Replication Failures` +- `AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason` +- `AllSplunkEnterpriseLevel - Splunkd Crash Logs Have Appeared in Production` +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` +- `AllSplunkLevel - Data Loss on shutdown` +- `AllSplunkLevel - TailReader Ignoring Path` +- `AllSplunkLevel - Time skew on Splunk Servers` +- `AllSplunkLevel - Unexpected termination of a Splunk process unix` +- `ClusterMasterLevel - excess buckets on master` +- `DeploymentServer - Error Found On Deployment Server` +- `ForwarderLevel - Channel churn issues` +- `ForwarderLevel - Data dropping duration` +- `ForwarderLevel - File Too Small to checkCRC occurring multiple times` +- `ForwarderLevel - Splunk HEC issues` +- `IndexerLevel - Buckets have being frozen due to index sizing SmartStore` +- `IndexerLevel - Connection errors to SmartStore` +- `IndexerLevel - ClusterMaster Advising SearchOrRep Factor Not Met` +- `IndexerLevel - Data parsing error` +- `IndexerLevel - events per second benchmark` +- `IndexerLevel - IndexConfig Warnings from Splunk indexers` +- `IndexerLevel - Indexer Queues May Have Issues` +- `IndexerLevel - Indexer replication queue issues to some peers` +- `IndexerLevel - Peer will not return results due to outdated generation` +- `IndexerLevel - platform_stats.counters hosts` +- `IndexerLevel - platform_stats.counters hosts 24hour` +- `IndexerLevel - platform_stats.indexers stddev measurement` +- `IndexerLevel - platform_stats.indexers stddev incoming measurement` +- `IndexerLevel - platform_stats.indexers totalgb measurement` +- `IndexerLevel - platform_stats.indexers totalgb_thruput measurement` +- `IndexerLevel - replicationdatareceiverthread close to 100% utilisation` +- `IndexerLevel - RemoteSearches find datamodel acceleration with wildcards` +- `IndexerLevel - RemoteSearches Indexes Stats` +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` +- `IndexerLevel - RemoteSearches - lookup usage` +- `IndexerLevel - Search Failures` +- `IndexerLevel - Slow peer from remote searches` +- `IndexerLevel - strings_metadata triggering bucket rolling` +- `MonitoringConsole - Check OS ulimits via REST` +- `MonitoringConsole - Core dumps have appeared on the filesystem` +- `MonitoringConsole - Crash logs have appeared on the filesystem` +- `MonitoringConsole - one or more servers require configuration` +- `MonitoringConsole - one or more servers require configuration automated` +- `SearchHeadLevel - audit.log - lookup usage` +- `SearchHeadLevel - authorize.conf settings will prevent some users from appearing in the UI` +- `SearchHeadLevel - Captain Switchover Occurring` +- `SearchHeadLevel - Dashboards invalid character in splunkd` +- `SearchHeadLevel - Dashboards using special characters` +- `SearchHeadLevel - Dashboards with all time searches set` +- `SearchHeadLevel - datamodel errors in splunkd` +- `SearchHeadLevel - Detect bundle pushes no longer occurring` +- `SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated` +- `SearchHeadLevel - Detect lookups that have not being accessed for a period of time` +- `SearchHeadLevel - Detect MongoDB errors` +- `SearchHeadLevel - Detect searches hitting corrupt buckets` +- `SearchHeadLevel - dispatch metadata files may need removal` +- `SearchHeadLevel - Excessive REST API usage` +- `SearchHeadLevel - Knowledge Bundle contents` +- `SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring` +- `SearchHeadLevel - license usage per sourcetype per index` +- `SearchHeadLevel - Lookup Editor lookup updates` +- `SearchHeadLevel - Lookup file owners` +- `SearchHeadLevel - Lookups within dashboards` +- `SearchHeadLevel - Lookups within savedsearches` +- `SearchHeadLevel - macros in use` +- `SearchHeadLevel - Peer timeouts or authentication issues` +- `SearchHeadLevel - platform_stats access summary` +- `SearchHeadLevel - platform_stats.audit metrics api` +- `SearchHeadLevel - platform_stats.audit metrics searches` +- `SearchHeadLevel - platform_stats.audit metrics users` +- `SearchHeadLevel - platform_stats.audit metrics users 24hour` +- `SearchHeadLevel - platform_stats.remote_searches metrics populating search` +- `SearchHeadLevel - platform_stats.remote_searches metrics populating search 24 hour` +- `SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` +- `SearchHeadLevel - platform_stats.users dashboards` +- `SearchHeadLevel - platform_stats.users savedsearches` +- `SearchHeadLevel - RMD5 to savedsearch_name lookupgen report` +- `SearchHeadLevel - REST API usage via audit.log` +- `SearchHeadLevel - savedsearches invalid character in splunkd` +- `SearchHeadLevel - SavedSearches using special characters` +- `SearchHeadLevel - Scheduled Searches That Cannot Run` +- `SearchHeadLevel - Script failures in the last day` +- `SearchHeadLevel - Search Messages admins only` +- `SearchHeadLevel - Search Messages user level` +- `SearchHeadLevel - Search Queries summary exact match` +- `SearchHeadLevel - Search Queries summary non-exact match` +- `SearchHeadLevel - SHC Captain unable to establish common bundle` +- `SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit` +- `SearchHeadLevel - Splunk Scheduler logs have not appeared in the last` +- `SearchHeadLevel - summary indexing searches not using durable search` +- `SearchHeadLevel - Users exceeding the disk quota` +- `syslog-ng - cache statistics summary` + +## KVStore Usage +Some CSV lookups are now replaced with kvstore entries due to the ability to sync the kvstore across multiple search head or search head cluster(s) via apps like [KV Store Tools Redux](https://splunkbase.splunk.com/app/5328/) + +## platform_stats reports +There are a number of reports with the keyword "platform_stats" in the title, these were designed to run mcollect commands (or to use summary indexing and durable search) to collect data into a metrics index +The metrics then contain detailed information around the number of users using Splunk per-search head cluster, data indexed at the indexing tier, resource usage per user et cetera. +There is plenty of detail in here but dashboards were not included for the information built from them, contributions welcome + +## Detecting which indexes are searched by Splunk users +As of version 8.0.8 there is still no accurate way to detect which indexes were searched by a user based on their level of access, the audit logs simply do not record which indexes were accessed +Therefore the following searches are part of this app to help achieve this goal: +- `SearchHeadLevel - Search Queries summary exact match` +- `SearchHeadLevel - Search Queries summary non-exact match` + +As per the searches description they both require other reports such as `SearchHeadLevel - Macro report`, the description of each search details the various reports they rely on to make them work. + +However these complicated searches are not 100% accurate, alternative searches exist in this app to work at the indexing tier: +- `IndexerLevel - RemoteSearches Indexes Stats` +- `IndexerLevel - RemoteSearches Indexes Stats Wildcard` + +The `remote_searches.log` at the indexing tier does not (usually) need to perform macro substitution but instead you do not have information around the user that ran the searches so this search is more likely to overcount index access than the search tier version, it is also less likely to miss an index due to macro usage or similar... + +In more detail, the challenges with the search head level's `audit.log` searches are: +- You cannot determine which index was used if multiple indexes were specified, for example a search such as `index=A OR index=B`, if this search results in more than 0 results, then you cannot be sure which index returned the results so both are recorded by searches in this app +- macros, eventtypes, tags and datamodels are recorded in the `audit.log` so you need to substitute the macro/eventtype/tag to correctly determine if an index is in use, to make this more complicated, macros can be nested so a macro may refer to another macro and the 2nd or 3rd macro may contain the `index=` information +- There are many ways to search an index, such as `index= ""`, `index IN (...)`, the regex'es attempt to deal with the various straightforward scenarios such as `NOT index=A index=B`, but it is not straightforward to correctly extract index names from the `audit.log` in all scenarios +- The `audit.log` information for ad-hoc searches does not record app-context, therefore even if you know the macro and user information, you cannot be sure which app the search was run from and therefore you cannot correctly substitute the macro/tag/eventtype information +- The queries I have built search for a `scan_count` of > 0, this way index=randomstring doesn't appear as an index access, however if a search is trying to use a valid index and the `scan_count` is 0 the search is not counted (this would likely be an edge case)... + +At the indexing tier the `remote_searches.log` file has different challenges: +- While macros, eventtypes and tags are expanded (in most cases, there are bugs that allow macros to reach the indexing tier), you instead lose the user context in cases such as ad-hoc searches. This means that a search like `index=*` run by a user with permissions to access 1 index to these searches will appear to be accessing all indexes. The current implementation of the RemoteSearches queries in this app assume access to all indexes if the username is unknown (which may result in excess matching rather than missing searches) +- app context is again missing for ad-hoc searches, although this is less important at the indexing tier +- You cannot determine which index was used if multiple indexes were specified, for example a search such as `index=A OR index=B`, if this search results in more than 0 results, then you cannot be sure which index returned the results so both are recorded by searches in this app +- If the log line is very long it is truncated with a message similar to ...{skipping 46464 bytes}..., this often results in the last index= in the log getting truncated and also some index= strings from the search will not appear in the logs (for example in a datamodel acceleration search with many indexes listed) +- Note that searches with a `scan_count` of 0 are counted, there is an additional metric to measure scan count if you wish to find only indexes that are scanning more than 0 data + +Either way the search head level version seems to be "good enough" to determine who is searching which index in most cases, the RemoteSearches queries cover some of the edge cases but the count will generally be higher than expected, the below ideas require more votes if these issues are important to you. + +The following ideas relate to this issue: +[Better audit logs](https://ideas.splunk.com/ideas/E-I-49) +[Provide index access statistics to assist in capacity planning of the indexing tier](https://ideas.splunk.com/ideas/E-I-38) + +## Which searches require the TA-Alerts for SplunkAdmins add-on? +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` +- `SearchHeadLevel - Search Queries summary non-exact match` +- `SearchHeadLevel - Dashboards using depends and running searches in the background` + +## Other notes +### search_id's +The macro search_type_from_sid attempts to determine the search "type" based on id and this worked quite well in older versions. +There are many variations which the macro doesn't show as they are effectively ad-hoc searches in my understanding, these include: +- md_ for metadata searches +- ta_ for typeahead searches +- sd_ (appears to be another kind of ad-hoc search) +- rt_ for realtime search + +In 9.1.3 the search_id pattern appears to have changed, or at least I didn't notice this change in 9.1.3, now there are search id's that start with: +- deep-dive- +- degraded-entities +- episode-review- +- event_management_query +- health-score-tile-search +- health-score-tree-base +- kpi-health-score-sparklines +- notable-events-search +- service-health-score +- side-kpi-table +- single-thresholding-preview +- common-fields-search +- event-management-detail +- get-block-listed-fields +- impact-services-search +- time-variant-preview +- trending-ad-analysis +- trending-ad-mad-analysis + +These are appear to be from premium apps but it does imply that there is a mechanism to customize the search_id's... + +## Feedback? +Feel free to open an issue on github or use the contact author on the SplunkBase link and I will try to get back to you when possible, thanks! + +## Release Notes +### 4.0.1 +New dashboard: +-`heavy_forwarder_analysis` - as found in the conf24 presentation PLA1509B + + New reports: +- `SearchHeadLevel - Job performance data per indexer handoff time` +- `SearchHeadLevel - KVStore collection size` +- `SearchHeadLevel - Savedsearches with schedules and no next_scheduled_time` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - search updates +- `AllSplunkEnterpriseLevel - Email Sending Failures` - added app context +- `IndexerLevel - These Indexes Are Approaching The warmDBCount limit` - added datatype=all argument +- `IndexerLevel - Cold data location approaching size limits` - added datatype=all argument +- `IndexerLevel - Unclean Shutdown - Fsck` - added datatype=all argument +- `SearchHeadLevel - Peer timeouts or authentication issues` - updates to use Splunkd source +- `SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit` - excluded summary indexing +- `SearchHeadLevel - Scheduled Searches without a configured earliest and latest time` - rewrote search for efficiency +- `SearchHeadLevel - Search Messages user level` - search updates +- `SearchHeadLevel - Search Messages admins only` - search updates + +Updated dashboards: +- `splunk_forwarder_output_tuning` - updated comments, removed heartbeatFrequency + +Updated macros: +- `search_type_from_sid` - minor tweaks to regex + +Updated reports: +- `SearchHeadLevel - indexes per savedsearch` - corrected typo on multisearch, re-wrote parts of the query to include subsearches as well +- `SearchHeadLevel - Indexes for savedsearch without subsearches` - corrected typo on multisearch +- `SearchHeadLevel - Search Queries summary non-exact match` - added delim for index IN (a b c), corrected typo on multisearch, updated description to link to https://github.com/TheWoodRanger/presentation-conf_24_audittrail_native_telemetry +- `SearchHeadLevel - Search Queries summary exact match` - added delim for index IN (a b c), corrected typo on multisearch, updated description to link to https://github.com/TheWoodRanger/presentation-conf_24_audittrail_native_telemetry + +Also updated the navigation menu. + +### 4.0.0 +- Merged pull request from sifters relating to replacing comment macro with the triple backtick option introduced in Splunk 8.1. This involved editing many searches to change the format of the comments. + +New reports: +- `SearchHeadLevel - configtracker index example2` + +The version number has moved to 4.0.0 as this change has the potential to introduce issues with the change of comment syntax. I've completed multiple reviews and I believe there should be no broken alerts but please report them via the contact the author if you find any + +This version removes compatibility with Splunk versions below 8.1 due to the use of the newer comment syntax + +### 3.0.14 +New reports: +- `SearchHeadLevel - Lookup definitions with no lookup file or kvstore collection` +- `SearchHeadLevel - User created kvstore collections` +- `SearchHeadLevel - Search Queries summary loadjob and savedsearch usage in audit logs` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` +- `SearchHeadLevel - Detect bundle pushes no longer occurring` +- `SearchHeadLevel - macros in use` +- `SearchHeadLevel - Search Messages user level` + +Updated reports: +- `SearchHeadLevel - audit.log - lookup usage` - added regex as the search field sometimes doesn't auto-extract correctly +- `SearchHeadLevel - Detect lookups that have not being accessed for a period of time` - added automatic lookups in +- `SearchHeadLevel - platform_stats access summary` - criteria update +- `SearchHeadLevel - Lookup file owners` - corrections to ensure that automatic lookups are not included +- `SearchHeadLevel - Search Queries summary non-exact match` - minor criteria update + +### 3.0.13 +New reports: +- `IndexerLevel - events per second benchmark` +- `IndexerLevel - savedsearches by indexer execution time` +- `SearchHeadLevel - indexes per savedsearch` +- `SearchHeadLevel - macros in use` +- `SearchHeadLevel - Indexes for savedsearch without subsearches` +- `SearchHeadLevel - platform_stats.remote_searches metrics populating search 24 hour` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - updated criteria +- `IndexerLevel - RemoteSearches find datamodel acceleration with wildcards` - updated regex +- `MonitoringConsole - one or more servers require configuration` - changed criteria +- `MonitoringConsole - one or more servers require configuration automated` - rewrote the alert +- `SearchHeadLevel - Indexer Peer Connection Failures` - updated comments +- `SearchHeadLevel - Detect searches hitting corrupt buckets` - updated comments +- `SearchHeadLevel - Users with auto-finalized searches` - updated comments +- `SearchHeadLevel - splunk_search_messages dispatch` - updated comments +- `SearchHeadLevel - Lookups within savedsearches` - corrected URL +- `SearchHeadLevel - Sourcetypes usage from search telemetry data` - description update +- `SearchHeadLevel - Jobs endpoint example` - updated description +- `SearchHeadLevel - SmartStore cache misses - dashboards` - minor update to regex +- `SearchHeadLevel - SmartStore cache misses - combined` - minor update to regex +- `SearchHeadLevel - Search Messages field extractor slow` - updated comments +- `SearchHeadLevel - Search Messages user level` - updated comments +- `SearchHeadLevel - Search Messages admins only` - updated criteria and comments + +Updated reports: +- `IndexerLevel - RemoteSearches - lookup usage` - typo fixed in description +- `IndexerLevel - Report on bucket corruption` - updated comments +- `SearchHeadLevel - summary indexing searches not using durable search` - corrected REST context +- `SearchHeadLevel - Lookups within savedsearches` - corrected REST context +- `SearchHeadLevel - platform_stats.audit metrics users` - added v2/v1 endpoints for search/jobs/export +- `SearchHeadLevel - platform_stats.audit metrics api` - added v2/v1 endpoints for search/jobs/export +- `SearchHeadLevel - platform_stats.audit metrics users 24hour` - added v2/v1 endpoints for search/jobs/export + +Updated to use macro `splunkadmins_clustermaster_host` instead of splunk_server=local: +- `ClusterMasterLevel - Primary bucket count per peer` +- `ClusterMasterLevel - excess buckets on master` +- `IndexerLevel - ClusterMaster Advising SearchOrRep Factor Not Met` + +Updated to use macro `splunkadmins_restmacro` instead of splunk_server=local: +- `IndexerLevel - Indexer replication queue issues to some peers` +- `SearchHeadLevel - Alerts that have not fired an action in X days` +- `SearchHeadLevel - Accelerated DataModels Access Info` +- `SearchHeadLevel - Accelerated DataModels with wildcard or no index specified` +- `SearchHeadLevel - authorize.conf settings will prevent some users from appearing in the UI` +- `SearchHeadLevel - Data Model Acceleration Completion Status` +- `SearchHeadLevel - DataModel Fields` +- `SearchHeadLevel - Dashboard refresh intervals` +- `SearchHeadLevel - Dashboards using depends and running searches in the background` +- `SearchHeadLevel - Dashboards using special characters` +- `SearchHeadLevel - Dashboards with all time searches set` +- `SearchHeadLevel - Dashboards that may benefit from base or post-process searches` +- `SearchHeadLevel - DataModels report` +- `SearchHeadLevel - Disabled modular inputs are running` +- `SearchHeadLevel - Detect changes to knowledge objects non-directory` +- `SearchHeadLevel - EventTypes report` +- `SearchHeadLevel - Index access list by user` +- `SearchHeadLevel - IndexesPerUser Report` +- `SearchHeadLevel - Knowledge bundle status on indexers` +- `SearchHeadLevel - Lookup file owners` +- `SearchHeadLevel - Lookup CSV size` +- `SearchHeadLevel - Macro report` +- `SearchHeadLevel - platform_stats.users savedsearches` +- `SearchHeadLevel - platform_stats.users dashboards` +- `SearchHeadLevel - Saved Searches with privileged owners and excessive write perms` +- `SearchHeadLevel - Summary searches using realtime search scheduling` +- `SearchHeadLevel - SavedSearches using special characters` +- `SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit` +- `SearchHeadLevel - summary indexing searches not using durable search` +- `SearchHeadLevel - Tags report` + +Other macro updates: +- `DeploymentServer - Count by application` + +### 3.0.12 +New alerts: +- `MonitoringConsole - one or more servers require configuration` +- `MonitoringConsole - one or more servers require configuration automated` +- `SearchHeadLevel - Peer timeouts or authentication issues` + +New macros: +- `splunkadmins_macro_sub` + +New reports: +- `SearchHeadLevel - Datamodel REST endpoint indexes in use` +- `SearchHeadLevel - Job performance data per indexer` +- `SearchHeadLevel - Jobs endpoint example` +- `SearchHeadLevel - configtracker index example` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more criteria +- `SearchHeadLevel - Search Messages user level` - more criteria +- `SearchHeadLevel - Search Messages admins only` - more criteria + +Updated dashboards: +- `splunk_forwarder_output_tuning` - to reference NLB/load balanced version of asynchronous forwarding + +Updated macros: +- `whataccessdoihave` - comments and added srchIndexesDisallowed + +Updated reports: +- `SearchHeadLevel - IndexesPerRole Remote Report` - comment updates only +- `SearchHeadLevel - Lookup file owners` - comment updates only + +Alerts added to future removal list: +- `ClusterMasterLevel - Per index status` + +Updated to use `splunkadmins_macro_sub` macro: +- `SearchHeadLevel - Dashboards with all time searches set` +- `SearchHeadLevel - Scheduled searches not specifying an index macro version` +- `SearchHeadLevel - Search Queries By Type Audit Logs macro version` +- `SearchHeadLevel - Search Queries By Type Audit Logs macro version other` +- `SearchHeadLevel - Search Queries summary exact match` +- `SearchHeadLevel - Search Queries summary non-exact match` +- `SearchHeadLevel - User - Dashboards searching all indexes macro version` + +Misc: +- Added supported themes settings in app.conf to allow the usage of dark theme (for 9.1 enterprise users and above) + +### 3.0.11 +Updated alerts: +- `AllSplunkEnterpriseLevel - ulimit on Splunk enterprise servers is below 8192` - missing parenthesis, thanks Gregg Woodcock +- `IndexerLevel - replicationdatareceiverthread close to 100% utilisation` - incorrect macro +- `MonitoringConsole - Crash logs have appeared on the filesystem` - incorrect macro, github issue #22, thanks SANSd20 + +Added lookup file: +- `splunkadmins_indexlist_by_cluster.csv` + +### 3.0.10 +- `SearchHeadLevel - audit.log - lookup usage` - correcting issue #21 (thanks @barrettnet) + +### 3.0.9 +In version 3.0.8 the lookup file `splunkadmins_hec_reply_code_lookup.csv` was updated based on [gettingsmarter (github repo)](https://github.com/redvelociraptor/gettingsmarter/), the updated lookup was created by @jgedeon and additionally includes some health endpoint return codes (as well as those returned by the standard HEC endpoint) + +Updated alerts: +- `SplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more criteria +- `SearchHeadLevel - Scheduled Searches That Cannot Run` - correcting issue #20 (thanks @barrettnet) + +Updated reports: +- `SearchHeadLevel - Search Queries summary exact match` - added provenance +- `SearchHeadLevel - Search Queries summary non-exact match` - added provenance +- `SearchHeadLevel - audit.log - lookup usage` - updated to handle mlspl files as well (apply command) +- `SearchHeadLevel - Lookup file owners` - now includes an additional join that can be used if TA-webtools is installed (to improve accuracy/exclude default lookup definitions/files) + +New reports: +- `SearchHeadLevel - Detect lookups that have not being accessed for a period of time` +- `SearchHeadLevel - Lookup Editor lookup updates` +- `SearchHeadLevel - Lookups within dashboards` +- `SearchHeadLevel - Lookups within savedsearches` +- `SearchHeadLevel - REST API usage via audit.log` + +### 3.0.8 +New alerts: +- `SearchHeadLevel - summary indexing searches not using durable search` + +New macros: +- `indexer_cluster_name` without any parameters created as per issue #19 (barrettnet) + +New reports: +- `SearchHeadLevel - audit.log - lookup usage` +- `SearchHeadLevel - license usage per sourcetype per index` +- `SearchHeadLevel - Lookup file owners` +- `IndexerLevel - RemoteSearches - lookup usage` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more matching criteria +- `SearchHeadLevel - Scheduled Searches That Cannot Run` - as per issue #18 (AHCL1) +- `SearchHeadLevel - SHC Captain unable to establish common bundle` - additional exclusion for Splunk 9.0.x + + +Updated reports: +- `IndexerLevel - platform_stats.indexers totalgb measurement` - added * to the end of `license_usage.log`, updated `indexer_cluster_name` with parameter as per issue #19 (barrettnet) +- `IndexerLevel - platform_stats.indexers totalgb_thruput measurement` - updated `indexer_cluster_name` with parameter as per issue #19 (barrettnet) +- `SearchHeadLevel - Search Queries summary exact match` - removed newlines to improve accuracy +- `SearchHeadLevel - Search Queries summary non-exact match` - removed newlines to improve accuracy + +Updated recommended links in nav menu + +### 3.0.7 +New macros: +- `sysloghosts` + +New reports: +- `SearchHeadLevel - Knowledge Bundle contents` +- `syslog-ng - cache statistics summary` - as contributed by Marc Andersen, company: NIL815 ApS + +Updated dashboards: +- `splunk_forwarder_output_tuning` - added fillnull for `ingest_pipe` + +Updated alerts: +- `AllSplunkLevel - No recent metrics.log data` - updated to use prestats +- `AllSplunkLevel - TCP Output Processor has paused the data flow` - updated criteria +- `AllSplunkEnterpriseLevel - ulimit on Splunk enterprise servers is below 8192` - now 64,000 (could be renamed in future) +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - updated criteria +- `ForwarderLevel - Splunk universal forwarders with ulimit issues` - updated keywords +- `SearchHeadLevel - Scheduled Searches That Cannot Run` - excluded the require command +- `SearchHeadLevel - Detect MongoDB errors` - updated to use prestats, added `_time` field +- `SearchHeadLevel - SHC Captain unable to establish common bundle` - added new criteria +- `SearchHeadLevel - Search Messages user level` - updated criteria + +### 3.0.6 +Updated dashboards: +- `Splunk forwarder output tuning` - added fillnull `ingest_pipe` + +Updated reports/alerts: +- `SearchHeadLevel - Dashboards using special characters` - updated to use spath command instead of rex +- `SearchHeadLevel - Search Messages user level` - excluded require command +- `IndexerLevel - RemoteSearches find all time searches` - removed keyword + +On reports/alerts: +- `IndexerLevel - RemoteSearches Indexes Stats` +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` +- `IndexerLevel - Slow peer from remote searches` +- `IndexerLevel - SmartStore cache misses - remote_searches` +- `SearchHeadLevel - platform_stats.remote_searches metrics populating search` + +Updated keywords to terminated: or closed: (previously terminated) + +On reports/alerts: +- `SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated` +- `SearchHeadLevel - platform_stats.audit metrics searches` +- `SearchHeadLevel - platform_stats.audit metrics users` +- `SearchHeadLevel - platform_stats.audit metrics users 24hour` +- `SearchHeadLevel - Search Queries By Type Audit Logs` +- `SearchHeadLevel - Search Queries By Type Audit Logs macro version` +- `SearchHeadLevel - Search Queries By Type Audit Logs macro version other` +- `SearchHeadLevel - Searches dispatched as owner by other users` +- `SearchHeadLevel - SmartStore cache misses - dashboards` +- `SearchHeadLevel - SmartStore cache misses - savedsearches` +- `SearchHeadLevel - SmartStore cache misses - combined` +- `SearchHeadLevel - Users with auto-finalized searches` + +Removed regex: +`| rex "(?s)^(?:[^'\n]*'){4},\s+\w+='(?P[\s\S]+)'\]($|\[[^\]]+\]$)"` + +As it is causing issues with max_matches, newer Splunk versions appear to accurately match the search field without this regex + +### 3.0.5 +New alerts: +- `IndexerLevel - Connection errors to SmartStore` + +New reports: +- `SearchHeadLevel - Sourcetypes usage from search telemetry data` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more matching criteria +- `ForwarderLevel - Data dropping duration` - comment update +- `SearchHeadLevel - Search Queries summary exact match` - regex updates and 1 regex removal +- `SearchHeadLevel - Search Queries summary non-exact match` - regex updates and 1 regex removal + +Updated macro: +- `splunkadmins_metrics_source` - corrected to include source= + +Removed app.manifest file + +### 3.0.4 +New alerts: +- `IndexerLevel - Buckets have being frozen due to index sizing SmartStore` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Replication Failures` - comment update +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - additional criteria and removed SHC restart times +- `IndexerLevel - Buckets have being frozen due to index sizing` - comment update only +- `IndexerLevel - IndexConfig Warnings from Splunk indexers` - additional criteria +- `SearchHeadLevel - Script failures in the last day` +- `SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring` +- `SearchHeadLevel - SavedSearches using special characters` +- `SearchHeadLevel - Search Messages user level` - removed some messages from the alert + +### 3.0.3 +SplunkBase validation failure (wrong manifest version) + +### 3.0.2 +Merged pull request from jeffland-consist via github including various changes + +New alerts: +- `IndexerLevel - replicationdatareceiverthread close to 100% utilisation` + +New macros: +- `splunkadmins_metrics_source` +- `splunkadmins_hec_metrics_source` + +New reports: +- `SearchHeadLevel - Accelerated DataModels Access Info` +- `SearchHeadLevel - Dashboards resulting in concurrency issues` +- `SearchHeadLevel - Dashboards that may benefit from base or post-process searches` +- `SearchHeadLevel - Searches by search type` + +Updated macros: +- `search_type_from_sid` +- `splunkadmins_splunkd_source` +- `splunkadmins_splunkuf_source` +- `splunkadmins_mongo_source` +- `splunkadmins_license_usage_source` +- `splunkadmins_deploymentserver_splunkserver` + +To include a trailing wildcard (so splunkd.log.1 matches or similar) + +Updated alerts: +- `AllSplunkEnterpriseLevel - Core Dumps Disabled` - updated matching criteria +- `AllSplunkEnterpriseLevel - Non-existent roles are assigned to users` - updated matching criteria +- `AllSplunkEnterpriseLevel - Splunk Servers throwing runScript errors` - updated matching criteria +- `AllSplunkEnterpriseLevel - sendmodalert errors` - updated matching criteria +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - updated matching criteria +- `AllSplunkEnterpriseLevel - Splunk Servers with resource starvation` - updated to use `splunkadmins_splunkd_source` macro +- `AllSplunkLevel - No recent metrics.log data` - corrected comment to be after tstats, updated to use `splunkadmins_metrics_source` macro +- `AllSplunkLevel - DeploymentServer Application Installation Error` - updated matching criteria +- `DeploymentServer - Application Not Found On Deployment Server` - updated matching criteria +- `ForwarderLevel - Channel churn issues` - updated to use `splunkadmins_metrics_source` macro +- `ForwarderLevel - Forwarders connecting to a single endpoint for extended periods` - updated to use `splunkadmins_metrics_source` macro +- `ForwarderLevel - Forwarders connecting to a single endpoint for extended periods UF level` - updated to use `splunkadmins_metrics_source` macro +- `ForwarderLevel - Splunk HTTP Listener Overwhelmed` - updated matching criteria +- `ForwarderLevel - Splunk Universal Forwarders Exceeding the File Descriptor Cache` - updated matching criteria +- `ForwarderLevel - Splunk Universal Forwarders that are time shifting` - updated matching criteria +- `ForwarderLevel - Stopping all listening ports` - updated to use `splunkadmins_splunkd_source` macro +- `IndexerLevel - Buckets changes per day` - updated matching criteria, updated to use `splunkadmins_splunkd_source` macro +- `IndexerLevel - Indexer Queues May Have Issues` - updated to use `splunkadmins_metrics_source` macro +- `IndexerLevel - Knowledge bundle upload stats` - updated to use `splunkadmins_metrics_source` macro +- `IndexerLevel - platform_stats.indexers totalgb_thruput measurement` - updated to use `splunkadmins_metrics_source` macro +- `IndexerLevel - platform_stats.indexers stddev measurement` - updated to use `splunkadmins_metrics_source` macro +- `IndexerLevel - platform_stats.indexers stddev incoming measurement` - updated to use `splunkadmins_metrics_source` macro +- `IndexerLevel - Weekly Broken Events Report` - updated matching criteria +- `IndexerLevel - Time format has changed multiple log types in one sourcetype` - updated matching criteria +- `IndexerLevel - Buckets have being frozen due to index sizing` - updated matching criteria +- `IndexerLevel - Unclean Shutdown - Fsck` - updated matching criteria +- `IndexerLevel - Index not defined` - updated matching criteria +- `IndexerLevel - Timestamp parsing issues combined alert` - updated to use `splunkadmins_splunkd_source` macro +- `IndexerLevel - S2SFileReceiver Error` - updated matching criteria +- `MonitoringConsole - Core dumps have appeared on the filesystem` - corrected to use `indexer_cluster_name` macro +- `MonitoringConsole - Crash logs have appeared on the filesystem` - corrected description +- `SearchHeadLevel - LDAP users have been disabled or left the company cleanup required` - updated matching criteria +- `SearchHeadLevel - Long filenames may be causing issues` - updated matching criteria +- `SearchHeadLevel - SHCluster Artifact Replication Issues` - updated matching criteria +- `SearchHeadLevel - Captain Switchover Occurring` - updated matching criteria +- `SearchHeadLevel - Knowledge bundle replication times metrics.log` - updated to use `splunkadmins_metrics_source` macro +- `SearchHeadLevel - Detect bundle pushes no longer occurring` - updated to use `splunkadmins_metrics_source` macro +- `SearchHeadLevel - WLM aborted searches` - updated matching criteria +- `SearchHeadLevel - SHC Captain unable to establish common bundle` - updated to use `splunkadmins_splunkd_source` macro + +Updated dashboards: +- `ClusterMasterJobs.xml` +- `heavyforwarders_max_data_queue_sizes_by_name.xml` +- `heavyforwarders_max_data_queue_sizes_by_name_v8.xml` +- `hec_performance.xml` +- `indexer_data_spread.xml` +- `indexer_max_data_queue_sizes_by_name.xml` +- `indexer_max_data_queue_sizes_by_name_v8.xml` +- `rolled_buckets_by_index.xml` +- `smartstore_stats.xml` +- `splunk_forwarder_data_balance_tuning.xml` +- `splunk_forwarder_output_tuning.xml` + +To use `splunkadmins_splunkd_source` and/or `splunkadmins_metrics_source` macros + +### 3.0.1 +New macros: +- `splunkadmins_shutdown_time_by_period` + +New alerts: +- `MonitoringConsole - Check OS ulimits via REST` +- `SearchHeadLevel - Detect bundle pushes no longer occurring` + +New reports: +- `DeploymentServer - Count by application` - contributed by @trex (radler) +- `IndexerLevel - DataModel Acceleration - Indexes in use` +- `SearchHeadLevel - Knowledge bundle status on indexers` +- `SearchHeadLevel - Knowledge bundle replication times metrics.log` + +Updated alerts: +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` + +Updated dashboards: +- `splunk_introspection_io_stats` - updated names/description of fields used +- `indexer_max_data_queue_sizes_by_name` - minor tweak to replication queue queries +- `indexer_max_data_queue_sizes_by_name_v8` - minor tweak to replication queue queries +- `splunk_forwarder_output_tuning` - comment update only + +Updated macros: +- `splunkadmins_shutdown_time_by_period(4)` to work as expected + +Added link to Admins Little Helper for Splunk and TrackMe +README.md improvements + +### 3.0.0 + +Due to the creation of TA-Alerts for SplunkAdmins, the following are removed in this release: +- bin directory +- README directory +- default/searchbnf.conf +- default/inputs.conf +- default/commands.conf + +LookupWatcher and the custom commands streamfilter and streamfilterwildcard are now moved into the new TA-Alerts for SplunkAdmins application + +New alerts: +- `AllSplunkEnterpriseLevel - error in stdout.log` +- `IndexerLevel - platform_stats.indexers stddev incoming measurement` +- `MonitoringConsole - Core dumps have appeared on the filesystem` +- `MonitoringConsole - Crash logs have appeared on the filesystem` +- `SearchHeadLevel - Splunk Scheduler logs have not appeared in the last` + +Updated: +- `AllSplunkEnterpriseLevel - Replication Failures` - simplified criteria to match more issues +- `AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - corrected order of statements so this works as expected, added 1 more exclusion +- `IndexerLevel - platform_stats.indexers stddev measurement` - narrowed down to sourcetype/source +- `IndexerLevel - Search Failures` - changed criteria +- `IndexerLevel - Indexer Queues May Have Issues` - added server count +- `IndexerLevel - RemoteSearches Indexes Stats Wilcard` - description update as this requires TA-Alerts for SplunkAdmins +- `SearchHeadLevel - Dashboards using depends and running searches in the background` - description update as this requires TA-Alerts for SplunkAdmins +- `SearchHeadLevel - Detect MongoDB errors` - excluded 1 warning +- `SearchHeadLevel - Search Queries summary exact match` - comment update +- `SearchHeadLevel - Search Queries summary non-exact match` - comment and description update as this requires TA-Alerts for SplunkAdmins +- `SearchHeadLevel - Search Messages user level` - removed "DAG Execution Exception" +- `SearchHeadLevel - Search Messages admins only` - excluded "Found no results to append to collection" + +### 2.6.13 +Updated python SDK to 1.6.20 + +Updates to reports/alerts: +`IndexerLevel - Future Dated Events that appeared in the last week` - comment upate + +`IndexerLevel - IndexConfig Warnings from Splunk indexers` - added wildcard to improve matching + +Updated regex to handle index:: case: +`IndexerLevel - RemoteSearches Indexes Stats` + +`IndexerLevel - RemoteSearches Indexes Stats Wilcard` + +`SearchHeadLevel - Determine query scan density` + +`SearchHeadLevel - Search Queries By Type Audit Logs` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version other` + +`SearchHeadLevel - SmartStore cache misses - dashboards` + +`SearchHeadLevel - SmartStore cache misses - savedsearches` + +`SearchHeadLevel - SmartStore cache misses - combined` + +Updated regex to handle index:: case: and minor tweak to replace comments with spaces: +`SearchHeadLevel - Search Queries summary exact match` + +`SearchHeadLevel - Search Queries summary non-exact match` + +Updated links in nav menu: +[SideView UI (user activity)](https://splunkbase.splunk.com/app/6449/) + +### 2.6.12 +Correct typo in savedsearches.conf (a missing \ character), (feedback from Vincent) + +### 2.6.11 +New dashboards: +`splunk_introspection_io_stats` - just an I/O focussed dashboard based on introspection data + +New macro: +`splunkadmins_shutdown_time_by_shc` + +`cluster_masters` + + +Updated alerts: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more criteria + +`IndexerLevel - IndexConfig Warnings from Splunk indexers` - updated criteria, using stats instead of top + +`SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring` - updated keywords for new instances, added more criteria to reduce false alarms + +`SearchHeadLevel - Lookup updates within SHC` - changed to addCommit instead of acceptPush + +Updated dashboards: +`heavyforwarders_max_data_queue_sizes_by_name_v8` - corrected missing space in "TcpOut KB per second per forwarder" panel, (feedback from Vincent) + +`indexer_max_data_queue_sizes_by_name` - updated comment on replication queue, replication queue issues now show duration + +`smartstore_stats` - updated comment + +`splunk_forwarder_output_tuning` - added attribution as the link is available via search engines and public, updated comments + +Changed: +`splunkadmins_userlist_indexinfo` into a csv file to prevent unncessary restarts related to updating this app (on standalone instances this triggers a restart due to collections.conf), collections.conf was removed from this app + +### 2.6.10 +README.md update + +New alert: +`SearchHeadLevel - Excessive REST API usage` + +New dashboard: +`splunk_forwarder_data_balance_tuning` - new dashboard based on Brett Adam's work + +New macro: +`diskusage` + +Updated alert: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more criteria + +`ForwarderLevel - Channel churn issues` - added another TERM to the search, added stats line to summarise the result, added to where so this fires only if channels added and removed + +`IndexerLevel - RemoteSearches Indexes Stats` - updated comment and rename of fields + +`IndexerLevel - RemoteSearches Indexes Stats Wilcard` - updated comment and rename of fields + +`SearchHeadLevel - Detect MongoDB errors` - regex update to remove false positives + +`SearchHeadLevel - Indexer Peer Connection Failures` - updated comment and sourcetype + +`SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` - added rounding of fields, updated comment + +`SearchHeadLevel - platform_stats.users savedsearches` - added time field + +`SearchHeadLevel - platform_stats.users dashboards` - added time field + +`SearchHeadLevel - Scheduled Searches That Cannot Run` - corrected failure count so it's accurate + +`SearchHeadLevel - Search Messages user level` - more criteria and excluded some warnings + +`SearchHeadLevel - Search Queries summary exact match` - updates to stats to include 1 more field, updated regex to match macros in multisearch commands, updated comment, removed extra ' character from search field + +`SearchHeadLevel - Search Queries summary non-exact match` - updated comment, updated regex to match macros in multisearch commands, removed extra ' character from search field + +Updated dashboards: +`hec_performance` - to include the additional `num_of_requests_waiting_ack` measurement from introspection data, if this is high it can stop data when tokens have useACK set to true + +`smartstore_stats` - various new panels around queueing of downloads, and other potential smartstore issues + +`splunk_forwarder_output_tuning` - update to include another measure of data balance + +Updated comments on alerts: +`AllSplunkLevel - Unable To Distribute to Peer` + +`SearchHeadLevel - splunk_search_messages dispatch` - description update + +Updated metadata file to allow `sc_admin` role access + +### 2.6.9 +Updated alerts: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - removed 1 log entry for consecutive date entries/unretrievable data + +`ForwarderLevel - Splunk HEC issues` - added cluster command + +New dashboards: +`ForwarderLevel - Splunk HEC issues` + +New reports: +`IndexerLevel - SmartStore cache misses - remote_searches` + +`IndexerLevel - Buckets in cache` + +`SearchHeadLevel - Detect searches hitting corrupt buckets` + +`SearchHeadLevel - SmartStore cache misses - savedsearches` + +`SearchHeadLevel - SmartStore cache misses - dashboards` + +`SearchHeadLevel - SmartStore cache misses - combined` + +Updated SDK to 1.6.18 + +Updated alerts/reports to remove unncessary `TERM()` commands: +`AllSplunkEnterpriseLevel - Losing Contact With Master Node` + +`AllSplunkEnterpriseLevel - Replication Failures` + +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` + +`ForwarderLevel - Splunk HEC issues` - included lookup file to translate the HTTP code seen by client (based on the documentation version 8.2.3) + +`IndexerLevel - Data parsing error` + +`IndexerLevel - IndexWriter pause duration` + +`IndexerLevel - RemoteSearches Indexes Stats` + +`IndexerLevel - RemoteSearches Indexes Stats Wilcard` + +`IndexerLevel - RemoteSearches find all time searches` + +`IndexerLevel - RemoteSearches find datamodel acceleration with wildcards` + +`IndexerLevel - Slow peer from remote searches` + +`SearchHeadLevel - Dashboards invalid character in splunkd` + +`SearchHeadLevel - platform_stats.remote_searches metrics populating search` + +`SearchHeadLevel - savedsearches invalid character in splunkd` + +`SearchHeadLevel - Script failures in the last day` + +`SearchHeadLevel - Search Messages field extractor slow` + +`SearchHeadLevel - Search Messages user level` + +`SearchHeadLevel - Search Messages admins only` + +### 2.6.8 +New alerts: +`AllSplunkLevel - No recent metrics.log data` + +New dashboards: +`heavyforwarders_max_data_queue_sizes_by_name_v8` - this version uses tstats with PREFIX so only works with Splunk 8.0+ + +`indexer_max_data_queue_sizes_by_name_v8` - this version uses tstats with PREFIX so only works with Splunk 8.0+ + +`splunk_forwarder_output_tuning` - using metrics.log to measure the TCP output/stdev per-name, includes example tuning parameters + +New reports: +`IndexerLevel - platform_stats.indexers stddev measurement` - stdev per indexer cluster (useful for tuning the outputs.conf from incoming servers) + +`IndexerLevel - platform_stats.indexers totalgb_thruput measurement` - index thruput measurements + +Updated alerts: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more alert criteria + +`IndexerLevel - Cold data location approaching size limits` - improvements to calculation of % used + +`IndexerLevel - Data parsing error` - added macro `splunkadmins_dataparsing_error` as requested + +`SearchHeadLevel - Realtime Scheduled Searches are in use` - updated timeout to 900 seconds, added context to description about potential use (as per feedback from Vincent) + +`SearchHeadLevel - Script failures in the last day` - improved user id matching + +`SearchHeadLevel - Search Messages admins only` - more alert criteria + +`SearchHeadLevel - Search Messages user level` - more alert criteria + +Updated macros: +`splunkadmins_shutdown_keyword` - updated keyword for shutdown state + +`splunkadmins_shutdown_list` - updated keyword for shutdown state + +`splunkadmins_shutdown_time` - updated keyword for shutdown state + +Updated reports: +`IndexerLevel - platform_stats.counters hosts` - updated to use `indexer_cluster_name` macro + +`IndexerLevel - platform_stats.counters hosts 24hour` - updated to use `indexer_cluster_name` macro + +`IndexerLevel - platform_stats.indexers totalgb measurement` - updated to use `indexer_cluster_name` macro, comment update + +`IndexerLevel - RemoteSearches find datamodel acceleration with wildcards` - handling the IN clause in `remote_searches.log` + +`IndexerLevel - RemoteSearches Indexes Stats` - added short field + +`IndexerLevel - RemoteSearches Indexes Stats` - added short field (set to False), to make queries easier + +`SearchHeadLevel - platform_stats.users dashboards` - updated mcollect comment + +`SearchHeadLevel - Search Messages user level` - added more error messages, limited the message to the first 30 messages + +`SearchHeadLevel - Search Messages admins only` - added more error messages + +`SearchHeadLevel - Search Queries summary exact match` - excluded Remote storage searches (no real difference) + +`SearchHeadLevel - Search Queries summary non-exact match` - excluded Remote storage searches (no real difference) + +### 2.6.7 +New alerts: +`IndexerLevel - SmartStore - Bucket cache errors audit logs` + +`SearchHeadLevel - Accelerated DataModels with wildcard or no index specified` + +New reports: +`IndexerLevel - IndexWriter pause duration` + +`IndexerLevel - RemoteSearches find all time searches` + +`IndexerLevel - RemoteSearches find datamodel acceleration with wildcards` + +`SearchHeadLevel - platform_stats.audit metrics users 24hour` + +`SearchHeadLevel - platform_stats.users dashboards` + +`SearchHeadLevel - platform_stats.users savedsearches` + +Updated alerts: +`AllSplunkEnterpriseLevel - sendmodalert errors` - updated to refer to `SearchHeadLevel - Script failures in the last day` as it replaces most of this alerts functionality... + +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - more alert criteria + +`DeploymentServer - Error Found On Deployment Server` + +`SearchHeadLevel - audit logs showing all time searches` - minor correction to display all searches without a `savedsearch_name` + +`SearchHeadLevel - Accelerated DataModels with All Time Searching Enabled` - re-wrote the search to not use map + +`SearchHeadLevel - Script failures in the last day` - updated to handle various webhook failures + +Updated reports: +`IndexerLevel - RemoteSearches Indexes Stats` - updates to work with search heads with _ in the name, improved handling of "skipped" entries + +`IndexerLevel - RemoteSearches Indexes Stats Wilcard` - updates to work with search heads with _ in the name, improved handling of "skipped" entries + +`SearchHeadLevel - Search Queries summary non-exact match` - new field "short", updated regex + +`SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` - updates to work with search heads with _ in the name + +`SearchHeadLevel - platform_stats.remote_searches metrics populating search` - updates to work with search heads with _ in the name + + +### 2.6.6 +Updated to Splunk python SDK 1.1.16 + +Merged from jordanfelle to fix special character + +Updated alerts: +`SearchHeadLevel - dispatch metadata files may need removal` + +`SearchHeadLevel - Dashboards with all time searches set` + +### 2.6.5 +New reports: +`IndexerLevel - RemoteSearches Indexes Stats Wilcard` - example wildcard match for remote_searches.log + +`SearchHeadLevel - Index list by cluster report` - for a list of indexes by indexer cluster + +Updated reports: +`IndexerLevel - RemoteSearches Indexes Stats` - added additional info around bucket cache usage, improved accuracy, provided mcollect example + +`IndexerLevel - Slow peer from remote searches` - added more search types into the list + +`SearchHeadLevel - Search Queries summary exact match` - improved accuracy for append/join/multisearch/set + +`SearchHeadLevel - Search Queries summary non-exact match` - improved accuracy for append/join/multisearch/set + +Updated alerts: +`AllSplunkEnterpriseLevel - Splunk Servers with resource starvation` - as per github issue #12, thanks RahimAbdulla + +`SearchHeadLevel - Detect MongoDB errors` - fix the alert by re-adding the fillnull into the subsearch + +Updated alerts/reports with new search macro for audit logs: +`SearchHeadLevel - Users with auto-finalized searches` + +`SearchHeadLevel - Search Queries By Type Audit Logs` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version other` + +`SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated` + +`SearchHeadLevel - platform_stats.audit metrics searches` + +`SearchHeadLevel - platform_stats.audit metrics users` + +`SearchHeadLevel - Searches dispatched as owner by other users` + +Updated alerts/reports with (?s) as some logs are now multi-line in 8.2.x (updating just in case): +`SearchHeadLevel - Scheduled searches not specifying an index` + +`SearchHeadLevel - User - Dashboards searching all indexes` + +`SearchHeadLevel - Realtime Search Queries in dashboards` + +`SearchHeadLevel - Scheduled searches not specifying an index macro version` + +`SearchHeadLevel - User - Dashboards searching all indexes macro version` + +`SearchHeadLevel - Determine query scan density` + +`SearchHeadLevel - Users with auto-finalized` + +`SearchHeadLevel - Scheduled searches status` + +`SearchHeadLevel - Dashboard refresh intervals` + +Updated macros: +`splunkadmins_audit_logs_macro_sub_v8` - to work in more cases (more output but less chance of missing a macro) + +Updated all dashboards to include the version="1.1" tag as required for new Splunk versions + +### 2.6.4 +Updated alerts: +`AllSplunkLevel - Splunk forwarders that are not talking to the deployment server` - contribution via email (Vincent) + +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - a few new additions + +`SearchHeadLevel - datamodel errors in splunkd` - excluded kvstore shutdown + +`SearchHeadLevel - Search Messages admins only` - new exclusions + +Updated dashboard: +`issues_per_sourcetype` - the `Invalid parsed time` panel needed another regex - contribution via email (Vincent) + +Updated reports: +`SearchHeadLevel - Search Queries summary exact match` - minor updates, added cache stats, improved accuracy + +`SearchHeadLevel - Search Queries summary non-exact match` - minor updates, added cache stats, improved accuracy + +Renamed/replaced reports: +`SearchHeadLevel - Search Queries summary exact match 73` - new name is `SearchHeadLevel - Search Queries summary exact match` + +`SearchHeadLevel - Search Queries summary non-exact match 73 ` - new name is `SearchHeadLevel - Search Queries summary non-exact match` + +`SearchHeadLevel - Search Queries summary exact match 73 by user` - new name is `SearchHeadLevel - Search Queries summary exact match by user` + +`SearchHeadLevel - Search Queries summary exact match 73 by index` - new name is `SearchHeadLevel - Search Queries summary exact match by index` + +Updates to: +`streamfilter.py` - correct utf-8 error python 3 + +`streamfilterwildcard.py` - correct utf-8 error python 3 + + +### 2.6.3 +New alert: +`SearchHeadLevel - authorize.conf settings will prevent some users from appearing in the UI` + +Updated alerts: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - a few more errors + +`SearchHeadLevel - Search Messages user level` - updated comment, added sid field + +`SearchHeadLevel - Search Messages admins only` - added sid field + +`SearchHeadLevel - Detect MongoDB errors` - added partial flag to remove false alarms (thanks afx) + +`IndexerLevel - Timestamp parsing issues combined alert` - update to provide a list of hosts per sourcetype + +Updated dashboards: +`detect_excessive_search_use` - removing ldap query section (as this is env specific) + +`issues_per_sourcetype` - wording update on title + +`knowledge_objects_by_app` - corrected drilldown link to point to the SplunkAdmins app (thanks Vincent!) + + +Updated Splunk python SDK to 1.6.15 + +### 2.6.2 +Identical to 2.6.1, re-released to get around automated app inspect failure + +### 2.6.1 +2 navigation menu items fixed (incorrect alert names) by pull request from EsOsO + +New alerts: +`SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit` - detect if any alert action exceeds the limit and receives limited results, currently a silent failure as per https://ideas.splunk.com/ideas/EID-I-781 + +Updated alerts: +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` - exclusion for config reload requiring restart + +`IndexerLevel - Search Failures` - comment/description update only (replaced by search messages based alerts) + +`SearchHeadLevel - Detect MongoDB errors` - added missing | symbol as per email update from afx + +`SearchHeadLevel - Search Messages user level` - excluded messages from kvstore initialization and a few others, added macros + +`SearchHeadLevel - Search Messages admins only` - added messages for kvstore unknown status and a few others, added macros + +`SearchHeadLevel - SHC Captain unable to establish common bundle` - excluded indexer shutdown times + +`SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit` - now ignores emails with no results inline (alert now joins with savedsearch info via map), added macro + +### 2.6.0 +Various README.md updates + +New alerts: + +`AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only` this generic alert is designed to capture a variety of splunkd log messages that warrant further investigation or show an issue exists that should be fixed. This alert is generic and captures many errors. + +`DeploymentServer - Error Found On Deployment Server` this alert captures deployment server errors, this is more generic than the current alert and designed to catch more scenarios + +`SearchHeadLevel - Dashboards invalid character in splunkd` this alert finds errors in splunkd related to invalid characters in a dashboard + +`SearchHeadLevel - savedsearches invalid character in splunkd` this alert finds errors in splunkd related to invalid characters in a saved search + +`SearchHeadLevel - datamodel errors in splunkd` this alert finds errors related to data models in the splunkd logs + +`SearchHeadLevel - Search Messages user level` this alert is designed to be combined with an app like sendresults. +This searches the splunk search messages and looks for errors that should be actionable by a end user +This is designed to be a generic alert covering many failure scenarios + +`SearchHeadLevel - Search Messages admins only` this alert searches the splunk search messages but is designed to find errors that cannot be fixed by end users, the user level version is for end user level errors + +New lookup file: + +`splunkadmins_rmd5_to_savedsearchname.csv` + +New reports: + +`SearchHeadLevel - RMD5 to savedsearch_name lookupgen report` new helper report for translating rmd5 names in the search id back to a report name. + +`SearchHeadLevel - Search Messages field extractor slow` looks for messages about a slow field extractor in the splunk search messages + +Updated macro: + +`search_type_from_sid` to work with real-time searches + +Updated alerts: + +`AllSplunkLevel - Application Installation Failures From Deployment Manager` updated to handle download failures and use cluster command + +`AllSplunkEnterpriseLevel - Email Sending Failures` updated to work on logging changes in 8.0.x + +`AllSplunkEnterpriseLevel - Splunk Servers throwing runScript errors` updated to work on logging changes in 8.0.x + +`AllSplunkEnterpriseLevel - Splunk Servers with resource starvation` now includes an additional error/warning message + +`AllSplunkEnterpriseLevel - Replication Failures` now includes more types of knowledge bundle replication issues and uses cluster command + +`IndexerLevel - IndexConfig Warnings from Splunk indexers` updated to include error level messages + +`IndexerLevel - Slow peer from remote searches` updated to remove special double quote characters + +`IndexerLevel - Peer will not return results due to outdated generation` to update description to refer to `AllSplunkEnterpriseLevel - Losing Contact With Master Node ` + +`IndexerLevel - Data parsing error` now includes csv and json line breaker errors, now uses stats instead of cluster + +`SearchHeadLevel - Script failures in the last day` expanded to handle modular alerts and script errors in one alert. Also attempts to translate base64 or encoded report names back to human readable versions + +`SearchHeadLevel - Macro report` updated crontab to all days of the week + +`SearchHeadLevel - Users with auto-finalized searches` description update + +`SearchHeadLevel - Search Queries summary exact match 73` minor update to deal with real-time searches in regex + +`SearchHeadLevel - Search Queries summary non-exact match 73` minor update to deal with real-time searches in regex + +`SearchHeadLevel - SHC Captain unable to establish common bundle` updated to include one more error/warning message + +`SearchHeadLevel - platform_stats access summary` updated to deal with real-time searches in regex + +`SearchHeadLevel - Dashboards using special characters` added ignore for trackme and network diagram viz as this was breaking the rex command, also removed an extra rex line + +`SearchHeadLevel - splunk_search_messages dispatch` comment update only + +`SearchHeadLevel - dispatch metadata files may need removal` update to use macro + +`SearchHeadLevel - Search Queries summary exact match 73` description/comment update + +`SearchHeadLevel - Search Queries summary non-exact match 73` description/comment update + +Renamed alert: + +`IndexerLevel - Splunk Indexers Losing Contact With Master` to `AllSplunkEnterpriseLevel - Losing Contact With Master Node` alert renamed and now includes search head to master node and indexers to master node in one alert + +Removed alert: + +`IndexerLevel - Unable to replicate thawed directories in a cluster` + +### 2.5.14 +Update Splunk python SDK to 1.6.14 + +New alerts: +`IndexerLevel - Slow peer from remote searches` + +Updated dashboard: +`hec_performance` as per pull request from jordanfelle + +### 2.5.13 +Minor fixes for app inspect (new empty lookup file) + +### 2.5.12 +New alerts: +`SearchHeadLevel - splunk_search_messages dispatch` + +`SearchHeadLevel - WLM aborted searches` + +`SearchHeadLevel - dispatch metadata files may need removal` + +Minor changes to reports: +`SearchHeadLevel - Search Queries summary exact match 73` + +`SearchHeadLevel - Search Queries summary non-exact match 73` + +And macro: +`splunkadmins_audit_logs_datamodel_sub` + +Updated alert: +`SearchHeadLevel - Dashboards with all time searches set` to look for earliest= in tokens and to ignore that case + +Updated reports: +`SearchHeadLevel - Indexer Peer Connection Failures` + +`SearchHeadLevel - Detect searches hitting corrupt buckets` + +The above were updated to use `splunk_search_messages` sourcetype + +`IndexerLevel - Knowledge bundle upload stats` updated to handle cascading bundle replication + +### 2.5.11 +Added notes around the `log_search_messages` property under [search] in limits.conf + +New macros: +`conf_rest_endpoint` + +`splunkadmins_epoch` + +`splunkadmins_audit_logs_datamodel_sub` + +`splunkadmins_audit_logs_eventtypes_sub` + +`splunkadmins_audit_logs_macro_sub_v8` - note this version uses mvmap so Splunk v8+, the `splunkadmins_audit_logs_macro_sub` still exists for pre-version 8 but can only replace 1 macro per run... + +`splunkadmins_audit_logs_tags_sub` + +New reports: +`SearchHeadLevel - DataModels report` + +`SearchHeadLevel - Tags report` + +`SearchHeadLevel - EventTypes report` + +Updated dashboard `troubleshooting_resource_usage_per_user_drilldown` to display the correct time range for more searches + +Updated reports: +`IndexerLevel - RemoteSearches Indexes Stats` - to summarize indexes stats + +`SearchHeadLevel - Scheduled searches not specifying an index macro version` + +`SearchHeadLevel - User - Dashboards searching all indexes macro version` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version` + +`SearchHeadLevel - Search Queries By Type Audit Logs macro version other` + +`SearchHeadLevel - Dashboards with all time searches set` + +To use the new macro `splunkadmins_audit_logs_macro_sub_v8` + +Upated reports: +`SearchHeadLevel - Search Queries summary exact match 73` + +`SearchHeadLevel - Search Queries summary non-exact match 73` + +To use the new macros `splunkadmins_audit_logs_macro_sub_v8`, `splunkadmins_audit_logs_eventtypes_sub`, `splunkadmins_audit_logs_datamodel_sub`, `splunkadmins_audit_logs_tags_sub` + + +### 2.5.10 +Updated to Splunk python SDK 1.6.13 (previous 2.5.9 did not include this update) + +New alerts: +`AllSplunkLevel - TailReader Ignoring Path` + +`ForwarderLevel - Channel churn issues` + +`SearchHeadLevel - Dashboards with all time searches set` + +New reports: +`SearchHeadLevel - audit logs showing all time searches` + +Updated reports: +`SearchHeadLevel - Macro report` to use the new macro + +`SearchHeadLevel - Search Queries summary exact match 73` to use the new macro + +`SearchHeadLevel - Search Queries summary non-exact match 73` to use the new macro + +New macros: +`splunkadmins_splunk_server_name` + +### 2.5.9 +New alerts: +`AllSplunkLevel - Unexpected termination of a Splunk process windows` + +`AllSplunkLevel - Unexpected termination of a Splunk process unix` + +`IndexerLevel - strings_metadata triggering bucket rolling` + +New reports: +`ForwarderLevel - Data dropping duration` + +`SearchHeadLevel - Lookup CSV size` + +New dashboards: +`lookup_audit` + +New macro: +`mylookups` (7.3.3+ only) + +New nav menu items: +Hyperlink to https://github.com/silkyrich/cluster_health_tools + +Updated to Splunk python SDK 1.6.12 +Set `python.version = python3` within inputs.conf.spec as per appinspect requirement + +### 2.5.8 +New alerts: +`ClusterMasterLevel - excess buckets on master` + +Updated alerts: +`ForwarderLevel - Splunk HEC issues` - corrected criteria for newer Splunk versions and added more matching in + +`SearchHeadLevel - SHC Captain unable to establish common bundle` - to remove special character from comment + +Renamed alert: +`IndexerLevel - Buckets are been frozen due to index sizing` to `IndexerLevel - Buckets have being frozen due to index sizing` (as requested by woodcock) + +New reports: +`SearchHeadLevel - Dashboards using special characters` + +`SearchHeadLevel - SavedSearches using special characters` + +### 2.5.7 +Moved lib directory to bin/lib (as this does not distribute to the indexers otherwise, sent feedback on https://dev.splunk.com/enterprise/docs/python/sdk-python/howtousesplunkpython/howtocreatemodpy/ so this gets updated) + +New macro: + +`base64decode` this macro requires decrypt or a similar app to be useful but the searches utilising this will work fine without it... + +New reports: +`SearchHeadLevel - platform_stats.audit metrics searches` + +`SearchHeadLevel - platform_stats.audit metrics users` + +`SearchHeadLevel - platform_stats.audit metrics api` + +The above 3 replace `SearchHeadLevel - platform_stats.audit metrics` which is now removed. + +New reports continued: + +`IndexerLevel - RemoteSearches Indexes Stats` + +`SearchHeadLevel - DataModel Fields` + +`SearchHeadLevel - Dashboard refresh intervals` + +`SearchHeadLevel - Dashboards using depends and running searches in the background` + +`SearchHeadLevel - Summary searches using realtime search scheduling` + +`SearchHeadLevel - Searches dispatched as owner by other users` + +Updated reports: + +`SearchHeadLevel - Search Queries summary exact match` + +`SearchHeadLevel - Search Queries summary non-exact match` + +Minor tweaks to the regex for both the above + +`SearchHeadLevel - Search Queries summary exact match 73` + +`SearchHeadLevel - Search Queries summary non-exact match 73` + +The above now attempt to handle append, join, appendcols, multisearch + +Also updated reports: + +`SearchHeadLevel - platform_stats.remote_searches metrics populating search` to ignore pretypeahead/copybuckets searches, and default acceleration searches + +`SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` to include indexer cluster as a field + +`SearchHeadLevel - Scheduled Searches That Cannot Run` to handle additional failure scenarios + +Updated `streamfilter.py`, `lookup_watcher.py` and `streamfilterwildcard.py` so they include the libraries from bin/lib + +### 2.5.6 +Further updates to the new reports from 2.5.5 relating to platform stats, improved accuracy with identifying dashboard usage vs ad-hoc searches + +Updated `SearchHeadLevel - platform_stats access summary` to include searches triggered (which are often coming from dashboard usage) + +New report: + +`SearchHeadLevel - platform_stats.remote_searches metrics populating search` + +Updated reports: + +`IndexerLevel - platform_stats.counters hosts` + +`IndexerLevel - platform_stats.counters hosts 24hour` + +`IndexerLevel - platform_stats.indexers totalgb measurement` + +`SearchHeadLevel - SHC conf log summary` + +`SearchHeadLevel - platform_stats.audit metrics` + +`SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` + +`SearchHeadLevel - platform_stats access summary` + +New macro: + +`search_type_from_sid` + +### 2.5.5 +Lookup Watcher now imports six from lib directory (allows this to work on older Splunk versions) +Minor update to props.conf for splunk:search:info as in 7.3 auto-finalized messages are now INFO level + +New alert: + +`SearchHeadLevel - SHC Captain unable to establish common bundle` + +New reports: + +`IndexerLevel - platform_stats.counters hosts` + +`IndexerLevel - platform_stats.counters hosts 24hour` + +`IndexerLevel - platform_stats.indexers totalgb measurement` + +`SearchHeadLevel - SHC conf log summary` + +`SearchHeadLevel - platform_stats.audit metrics` + +`SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search` + +`SearchHeadLevel - platform_stats access summary` + +Updated dashboard: + +`indexer_max_data_queue_sizes_by_name` + +New macro: + +`search_head_cluster` + +### 2.5.4 +Re-release of 2.5.3 due to strange issue in SplunkBase + +### 2.5.3 +Lookup files are now included (zero sized), note that you will need to re-generate them after install if you overwrite the lookups used by some reports... + +New macros: + +`splunkadmins_audit_logs_macro_sub` + +`splunkadmins_remote_macros` (this macro requires TA-webtools), alternatively you can you the Mothership app (SplunkBase) + +`splunkadmins_remote_roles` (this macro requires TA-webtools), alternatively you can you the Mothership app (SplunkBase) + +New reports: + +`SearchHeadLevel - IndexesPerRole Remote Report` + +`SearchHeadLevel - IndexesPerRole Report` + +`SearchHeadLevel - IndexesPerRole srchIndexesallowed Report` + +`SearchHeadLevel - IndexesPerRole srchIndexesdefault Report` + +`SearchHeadLevel - Search Queries summary exact match 73` + +`SearchHeadLevel - Search Queries summary exact match 73 by user` (uses Search Queries summary exact match 73 as base) + +`SearchHeadLevel - Search Queries summary exact match 73 by index` (uses Search Queries summary exact match 73 as base) + +`SearchHeadLevel - Search Queries summary non-exact match 73` + +`SearchHeadLevel - IndexesPerUser Report` + +Updated alerts: + +`IndexerLevel - Time format has changed multiple log types in one sourcetype` + +`IndexerLevel - Timestamp parsing issues combined alert` + +Updated dashboard: + +`issues_per_sourcetype` + +Updated report: + +`Updated report SearchHeadLevel - Macro report` + +With new regex due to change in newer Splunk versions (credit to woodcock for the update) + + +Lookup file `splunkadmins_macros_temp.csv` renamed to `splunkadmins_macros.csv` + + +Changes for python3 compatability + +Updated python SDK to 1.6.11 (from 1.6.6) + +### 2.5.2 +New modular input - Lookup Watcher - details in the README.md file +Introduced a new sub-menu in the navigation menu for Search Head Level "Recommended (externally hosted)" with links to external dashboards + +Updated reports: +`SearchHeadLevel - Search Queries By Type Audit Logs` +`SearchHeadLevel - Search Queries By Type Audit Logs macro version` +`SearchHeadLevel - Search Queries By Type Audit Logs macro version other` + +To reduce the number of unknown queries + +Updated reports: +`SearchHeadLevel - Search Queries summary exact match` +`SearchHeadLevel - Search Queries summary non-exact match` + +To improve the statistics around indexes found + +### 2.5.1 +Updated alert - `SearchHeadLevel - Scheduled Searches That Cannot Run` tweak to find more results + +Updated dashboard `issues per sourcetype` to handle message becoming event_message in newer Splunk versions (7.1 or 7.2) + +Updated macros `splunkadmins_shutdown_list`, `splunkadmins_shutdown_keyword`, `splunkadmins_shutdown_time`, `splunkadmins_transfer_captain_times` to handle message becoming event_message in newer Splunk versions (7.1 or 7.2) + +Updated python files streamfilter/streamfilterwildcard to import lib relative to the current app name + +Updated alerts / reports: + - `AllSplunkLevel execprocessor errors` + - `AllSplunkLevel - TCP Output Processor has paused the data flow` + - `AllSplunkEnterpriseLevel - Detect LDAP groups that no longer exist` + - `AllSplunkEnterpriseLevel - Email Sending Failures` + - `AllSplunkEnterpriseLevel - File integrity check failure` + - `AllSplunkEnterpriseLevel - Non-existent roles are assigned to users` + - `AllSplunkEnterpriseLevel - Replication Failures` + - `AllSplunkEnterpriseLevel - TCP or SSL Config Issue` + - `AllSplunkEnterpriseLevel - Unable to dispatch searches due to disk space` + - `DeploymentServer - btool validation failures occurring on deployment server` + - `DeploymentServer - Unsupported attribute within DS config` + - `ForwarderLevel - crcSalt or initCrcLength change may be required` + - `ForwarderLevel - Splunk Universal Forwarders Exceeding the File Descriptor Cache` + - `IndexerLevel - Buckets are been frozen due to index sizing` + - `IndexerLevel - IndexConfig Warnings from Splunk indexers` + - `IndexerLevel - Index not defined` + - `IndexerLevel - Peer will not return results due to outdated generation` + - `IndexerLevel - Time format has changed multiple log types in one sourcetype` + - `IndexerLevel - Too many events with the same timestamp` + - `IndexerLevel - Valid Timestamp Invalid Parsed Time` + - `SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring` + - `SearchHeadLevel - LDAP users have been disabled or left the company cleanup required` + - `SearchHeadLevel - Scheduled Searches That Cannot Run` + - `SearchHeadLevel - Scheduled searches failing in cluster with 404 error` +To handle message becoming event_message in newer Splunk versions (7.1 or 7.2) + +### 2.5.0 +New dashboard `HEC Performance` (original from [camrunr's github](https://github.com/camrunr/hec_perf_report/blob/master/hec_perf_report.xml)) + +New macro - `splunkadmins_shutdown_keyword` + +New report - `IndexerLevel - Knowledge bundle upload stats` + +Updated alert - `AllSplunkEnterpriseLevel - Replication Failures` with new criteria and excluded shutdowns + +Updated alert - `AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason` to handle another skipped scenario + +Updated alert - `AllSplunkEnterpriseLevel - Splunk Servers with resource starvation` with new comments + +Updated alert - `SearchHeadLevel - Detect MongoDB errors` with update to handle tstats issue in Splunk (issue #3 in github) + +Moved splunklib into "lib" directory of app as per updated appinspect recommendations + +### 2.4.9 +Updated alert - `SearchHeadLevel - Detect MongoDB errors` to include " W " based on git feedback + +### 2.4.8 +New alert - `ForwarderLevel - Splunk HEC issues` + +New dashboard - `Lookups in use finder` + +New macro - `splunkadmins_license_usage_source` + +New report - `IndexerLevel - Maximum memory utilisation per search` + +New report - `SearchHeadLevel - Lookup updates within SHC` + +New report - `SearchHeadLevel - Maximum memory utilisation per search` + +New report - `SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated` + +Updated alert - `AllSplunkEnterpriseLevel - Replication Failures` to match more results + +Updated alert - `ForwarderLevel - Splunk HTTP Listener Overwhelmed` comment/description update + +Updated dashboard - `Rolled buckets by index` - to no longer hardcode Linux paths to the license usage log + +Updated dashboard - `Heavy Forwarders Max Data Queue Sizes by name` to use the thruput in the metrics.log + +### 2.4.7 +New README (README.md replaces README) +New dashboard `Detect excessive search usage` +New dashboard `Cluster Master Jobs` +New dashboard `Knowledge Objects by app` (and drilldown dashboard) +New report - `IndexerLevel - Corrupt buckets via DBInspect` +New report - `SearchHeadLevel - Detect changes to knowledge objects` +New report - `SearchHeadLevel - Detect changes to knowledge objects directory` +New report - `SearchHeadLevel - Detect changes to knowledge objects non-directory` +Updated alert `ForwarderLevel - Splunk Universal Forwarders Exceeding the File Descriptor Cache` (comment update) +Updated alert `IndexerLevel - Uneven Indexed Data Across The Indexers` to handle a varying number of indexers +Updated various reports to include the `splunkadmins_restmacro`, this ensures `splunk_server=local` is used where appropriate +Updated dashboard `heavyforwarders_max_data_queue_sizes_by_name`, now has a filter for hosts to look at, corrected TCPOut KB per second panel +Updated macro `splunkadmins_splunkd_source` now defaults to `*splunkd.log` (previously `/opt/splunk/var/log/splunk/splunkd.log`) +Updated macro `splunkadmins_mongo_source` now defaults to `*mongod.log` (previously `/opt/splunk/var/log/splunk/mongod.log`) +Updated report `SearchHeadLevel - Search Queries summary exact` to remove the mvexpand and selfjoin (replaced by stats) + +### 2.4.6 +New alert - AllSplunkLevel - Data Loss on shutdown +New macro - whataccessdoihave - can be used with | `whataccessdoihave` by users +New report - SearchHeadLevel - Dashboard load times +New report - SearchHeadLevel - Scheduled searches status +Updated dashboard - Troubleshooting Resource Usage Per User Drilldown - now uses `search_et/search_lt` +Removed report - What access do I have? (Replaced by macro/What access do I have without REST) + +Upgraded Splunk python SDK to 1.6.6, note if this causes problems with other applications removing the bin directory only disables the "Search Queries summary non-exact match" report + +### 2.4.5 +Minor corrections + +Updated SearchHeadLevel - Search Queries By Type Audit Logs - minor tweak to macroWithIndexClause +Updated SearchHeadLevel - Search Queries By Type Audit Logs macro version - minor tweak to macroWithIndexClause and hasMacro +Updated SearchHeadLevel - Search Queries By Type Audit Logs macro version other - minor tweak to macroWithIndexClause and hasMacro + +### 2.4.4 +New command - streamfilter +New command - streamfilterwildcard +New dashboard - Troubleshooting Resource Usage Per User +New dashboard - Troubleshooting Resource Usage Per User Drilldown +New report - SearchHeadLevel - Index access list by user +New report - SearchHeadLevel - Index list report +New report - SearchHeadLevel - Role access list by user +New report - SearchHeadLevel - Scheduled Search Efficiency +New report - SearchHeadLevel - Search Queries Per Day Audit Logs +New report - SearchHeadLevel - Search Queries By Type Audit Logs +New report - SearchHeadLevel - Search Queries By Type Audit Logs macro version +New report - SearchHeadLevel - Search Queries By Type Audit Logs macro version other +New report - SearchHeadLevel - Search Queries summary exact match +New report - SearchHeadLevel - Search Queries summary non-exact match +New report - SearchHeadLevel - Users with auto-finalized searches +New report - What Access Do I Have Without REST? To work without the `dispatch_rest_to_indexers` +Updated alert - AllSplunkEnterpriseLevel - Email Sending Failures - to make it easier to automate +Updated alert - SearchHeadLevel - Captain Switchover Occurring - to ignore a harmless warning message (NOT_LEADER) +Updated alert SearchHeadLevel - Scheduled Searches That Cannot Run to no longer ignore map alerts from this app +Updated alert - SearchHeadLevel - Scheduled searches not specifying an index macro version - to use an improved macro match +Updated alert - SearchHeadLevel - User - Dashboards searching all indexes macro version - to use an improved macro match +Updated dashboard Troubleshooting indexer CPU as sort was not working +Updated report - SearchHeadLevel - Macro report - to use `splunk_server=local` +Updated report - What Access Do I Have? to use `splunk_server=local` +Updated report - What Access Do I Have Without REST? to supply index list + +### 2.4.3 +A very minor release, the app inspect CLI and REST API provided different results on what needed to be fixed + +Updated alert SearchHeadLevel - Users exceeding the disk quota introspection to not throw an error in the map command for no results +Updated alert SearchHeadLevel - Users exceeding the disk quota to not throw an error in the map command for no results + +### 2.4.2 +A very minor release, the app inspect badge does not allow external dependencies so changing 1 alert to get the badge + +Updated alert SearchHeadLevel - Users exceeding the disk quota introspection to comment out sendresults command + +### 2.4.1 +Introduced an updated navigation menu to navigate around the alerts, reports and dashboards available in the app +Changed label of all dashboards to have Dashboard - ... this is just to make the navigation menu work as expected + +New alert IndexerLevel - Buckets changes per day +New alert IndexerLevel - Timestamp parsing issues combined alert +New report SearchHeadLevel - Audit log search example only +Updated alert IndexerLevel - Future Dated Events that appeared in the last week to +10y instead of +20y +Updated alert IndexerLevel - Indexer Queues May Have Issues - to work with multiple pipelines +Updated alert IndexerLevel - Buckets rolling more frequently than expected with an improved regex +Updated alert SearchHeadLevel - Captain Switchover Occurring - to ignore manual captain transfers +Corrected alert SearchHeadLevel - Determine query scan density with a relevant query + +Note 2.4.0 was never released + +### 2.3.9 +Updated alert SearchHeadLevel - Detect searches hitting corrupt buckets to detect 1 more variation of the issue +Updated alert SearchHeadLevel - Users exceeding the disk quota to include username +Updated alert SearchHeadLevel - Scheduled Searches That Cannot Run to ignore the new report (SearchHeadLevel - Users exceeding the disk quota introspection) +Updated report ForwarderLevel - Forwarders connecting to a single endpoint for extended periods (and UF level version) to use the hostname/name parameters +Renamed alert IndexerLevel - ERROR from linebreaker to IndexerLevel - Data parsing error +New report SearchHeadLevel - Users exceeding the disk quota introspection +New report SearchHeadLevel - Users exceeding the disk quota introspection cleanup + +### 2.3.8 +New reports for diagnosing forwarder issues, alerts around bucket corruption and peer connection failures +New dashboards for troubleshooting sourcetypes or buckets rolled per day +Updated all alerts with an investigationQuery to use `index=*` explicitly rather than assume the admin has all indexes listed in the indexes searched by default list + +Update summary: +New alert - IndexerLevel - Detect bucket corruption +New alert - SearchHeadLevel - Indexer Peer Connection Failures +Updated alert ClusterMasterLevel - Per index status to 5 minute intervals for certification purposes +Renamed alert IndexerLevel - Detect bucket corruption to a report IndexerLevel - Report on bucket corruption (refer to IndexerLevel - Unclean Shutdown - Fsck for an alert) +New report - ForwarderLevel - Forwarders connecting to a single endpoint for extended periods +New report - ForwarderLevel - Forwarders connecting to a single endpoint for extended periods UF level +New report - SearchHeadLevel - Determine query scan density +New report - SearchHeadLevel - Detect searches hitting corrupt buckets +New dashboard - Issues per sourcetype, a combination of timestamp parsing, future based and past data searches to look at a single problematic sourcetype +New dashboard - Rolled buckets by index, a dashboard to assist with determing which index is rolling the most buckets + +### 2.3.5 +Update summary: +Updated IndexerLevel - Cold data location approaching size limits to handle only maxTotalDataSizeMB been set +Updated Future Dated Events that appeared in the last week to use +10y and 7.1 rejects +20y +Corrected AllSplunkEnterpriseLevel - TCP or SSL Config Issue to remove extra ( symbol +Corrected SearchHeadLevel - User - Dashboards searching all indexes macro version to refer to correct lookup name +Corrected dashboard for troubleshooting indexer CPU to handle standalone server +Inclusion of alternative app icons to work in 7.1 + +### 2.3.4 +Update summary: +Updated SearchHeadLevel - Scheduled searches not specifying an index to exclude 1 additional type of search +Updated SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring to detect a disconnected member scenario +Updated Troubleshooting indexer CPU & drilldown dashboards to include commmas and the search head field (to make it easier to update to search head instead of indexer hosts) + +### 2.3.3 +Update summary: +New alert SearchHeadLevel - Disabled modular inputs are running +Updated SearchHeadLevel - Detect MongoDB errors to timechart to have no limit on the number of hosts involved +Updated the shutdown macros to find one additional scenarios + +### 2.3.2 +Due to resourcing issues on the search heads this includes a few warnings/errors related to performance issues + +Update summary: +New alert AllSplunkEnterpriseLevel - Splunk Servers with resource starvation +New alert IndexerLevel - S2SFileReceiver Error +New alert SearchHeadLevel - Captain Switchover Occurring +Updated ForwarderLevel - Splunk Universal Forwarders that are time shifting to include "System time went backwards by..." +Updated IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues) to show when the failure related to been outside the acceptable time window +Updated SearchHeadLevel - User - Dashboards searching all indexes to simplify regex (ignore anything starting with a pipe symbol) +Updated SearchHeadLevel - User - Dashboards searching all indexes macro version to simplify regex (ignore anything starting with a pipe symbol) +Corrected AllSplunkEnterpriseLevel - sendmodalert errors to not show random `savedsearch_names` when no match is found +Corrected SearchHeadLevel - Alerts that have not fired an action in X days to only show alerts relevant to the current search head/cluster + +### 2.3.1 +Update summary: +New alert AllSplunkEnterpriseLevel - Non-existent roles are assigned to users +New alert IndexerLevel - Index not defined +New alert IndexerLevel - Search Failures +New alert SearchHeadLevel - Scheduled searches not specifying an index macro version (detect lack of index= with 1 level of macro expansion) +New alert SearchHeadLevel - Saved Searches with privileged owners and excessive write perms (detect 1 way of accessing data outside your level of access) +New alert SearchHeadLevel - User - Dashboards searching all indexes macro version +New report SearchHeadLevel - Macro report (required by "macro version" alerts) +Updated AllSplunkEnterpriseLevel - TCP or SSL Config Issue to include an additional scenario as reported by a customer +Updated SearchHeadLevel - Scheduled searches not specifying an index to not find searches with macros and to include example query +Updated SearchHeadLevel - Scheduled Searches That Cannot Run to make the message field accurate in all situations +Updated SearchHeadLevel - User - Dashboards searching all indexes to include example query to find indexes, and to not find macro-based queries +Corrected AllSplunkLevel - Unable To Distribute to Peer +Corrected IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues) to correctly exclude broken events & to handle newer 7.0.2 errors + +### 2.3.0 +Minor updates to a few alerts and a new alert + +Update summary: +New alert AllSplunkEnterpriseLevel - Detect LDAP groups that no longer exist +New alert ClusterMasterLevel - Per index status +New report ClusterMasterLevel - Primary bucket count per peer +Updated AllSplunkEnterpriseLevel - TCP or SSL Config Issue to find most recent (not oldest example) +Updated AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason to exclude the timewindow upto 10 minutes post-shutdown of an indexer +Updated AllSplunkLevel - TCP Output Processor has paused the data flow to use a stats command instead of raw/host information +Updated DeploymentServer - Unsupported attribute within DS config - to find most recent (not oldest example) +Updated IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues) - to find most recent (not oldest example) +Updated SearchHeadLevel - Detect MongoDB errors - mild tweak to output data, added customisation macros +Updated SearchHeadLevel - Scheduled Searches That Cannot Run - to detect errors in splunkd related to saved searches +Corrected SearchHeadLevel - User - Dashboards searching all indexes - a newline resulted in it working in search but not via the scheduler! + +### 2.2 +Not released, combined with 2.3.0 +Attempt to reduce false alarms and improve investigationQuery searches +Created macros for shutdown events for indexers/search heads/enterprise servers for excluding false alarms related to restarts + +Update summary: +New macro `splunkadmins_shutdown_list` +New macro `splunkadmins_shutdown_time` +Updated AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason - to use shutdown macro +Updated AllSplunkEnterpriseLevel - Splunk Scheduler excessive delays in executing search - to use shutdown macro +Updated AllSplunkLevel - TCP Output Processor has paused the data flow - to use shutdown macro +Updated AllSplunkLevel - Unable To Distribute to Peer - to use shutdown macro +Updated ForwarderLevel - Splunk forwarders are having issues with sending data to indexers - to use shutdown macro +Updated ForwarderLevel - SplunkStream Errors - to use shutdown macro +Updated ForwarderLevel - Unusual number of duplication alerts - to use shutdown macro & changed the alert to fire on >10 results per host +Updated IndexerLevel - Weekly Truncated Logs Report - hostnames wildcarded to deal with short names (for syslog for example) +Updated SearchHeadLevel - Detect MongoDB errors - timespan increased to 10 minutes and 5 minutes produces false alarms, and added shutdown macro +Updated SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring - to use shutdown macro + +### 2.1 +Added macros which can be customised to the majority of alerts, this reduces the need to customise the alert itself and should make upgrading to new versions of the application easier... + +Update summary: +New alert AllSplunkEnterpriseLevel - Unable to dispatch searches due to disk space +New alert IndexerLevel - Unclean Shutdown - Fsck +New macros - various macros introduced due to customer feedback about the requirement to customise the alerts +Updated SearchHeadLevel - Users exceeding the disk quota to list top 10 consumers of disk +Updated SearchHeadLevel - Scheduled Searches That Cannot Run (to ignore the above) +Updated IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues) to list sources per sourcetype/host +Updated ForwarderLevel - Splunk Insufficient Permissions to Read Files to include new macro, hint, invesQuery & to improve the accuracy +Updated SearchHeadLevel - Detect MongoDB errors (customer feedback, includes F/fatal errors now) +README and description updates for searches + +### 2.0 +Multiple searches now have an "investigationQuery" in them, the idea is that you can copy and paste the output into a search window and see results relevant to the particular alert +The last few releases have been attempting to reduce false alarms from alerts related to server restarts + +Update summary: +New alert IndexerLevel - Cold data location approaching size limits +New/renamed alert AllSplunkEnterpriseLevel - Splunk Scheduler excessive delays in executing search +New/renamed alert AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason +Updated AllSplunkLevel - Time skew on Splunk Servers +Updated IndexerLevel - Future Dated Events that appeared in the last week +Updated IndexerLevel - Time format has changed multiple log types in one sourcetype +Updated IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues) +Updated IndexerLevel - Weekly Broken Events Report +Updated IndexerLevel - Weekly Truncated Logs Report +Updated IndexerLevel - Old data appearing in Splunk indexes +Updated IndexerLevel - Valid Timestamp Invalid Parsed Time +Updated IndexerLevel - Too many events with the same timestamp +Updated IndexerLevel - Large multiline events using `SHOULD_LINEMERGE` setting +Updated SearchHeadLevel - Users exceeding the disk quota +Updated IndexerLevel - Indexer Queues May Have Issues (to be less sensitive to indexqueue issues) +Corrected AllSplunkLevel - TCP Output Processor has paused the data flow +Corrected ForwarderLevel - Splunk Universal Forwarders that are time shifting +Removed AllSplunkEnterpriseLevel - Splunk Servers with time skew (replaced by Time skew on Splunk Servers) +Removed SearchHead Level - Splunk Scheduler excessive delays in executing search (renamed) +Removed SearchHeadLevel - Splunk Scheduler Skipped Searches and the reason (renamed) + +### 1.9 +New macro `splunkadmins_mongo_source` +New alert IndexerLevel - Too many events with the same timestamp +New alert SearchHeadLevel - Detect MongoDB errors +New dashboard Data Model Status +New dashboard Data Model Rebuild Monitor +Updated AllSplunkEnterpriseLevel - Email Sending Failures to include to the toaddress +Updated SearchHeadLevel - Splunk Users Violating the Search Quota to detect an alternative log message +Updated Scheduled searches not specifying an index +Updated SearchHeadLevel - Scheduled Searches without a configured earliest and latest time +Updated ForwarderLevel - File Too Small to checkCRC occurring multiple times, to handle spaces in filename +Updated ForwarderLevel - crcSalt or initCrcLength change may be required +Updated IndexerLevel - Indexer Queues May Have Issues to be less sensitive +Updated IndexerLevel - Splunk Indexers Losing Contact With Master for an additional scenario +Updated AllSplunkEnterpriseLevel - ulimit on Splunk enterprise servers is below 8192 to improve emails +Corrected AllSplunkLevel - Splunk forwarders that are not talking to the deployment server + +### 1.8 +New alert IndexerLevel - Peer will not return results due to outdated generation +New alert SearchHeadLevel - Scheduled searches failing in cluster with 404 error +Updated ForwarderLevel - File Too Small to checkCRC occurring multiple times to have the correct dispatch application +Updated AllSplunkEnterpriseLevel - Email Sending Failures with saved search name +Updated IndexerLevel - Indexer replication queue issues to some peers to be less sensitive as this cannot be tuned in 7.0.0 +Updated AllSplunkLevel - Unable To Distribute to Peer to include the peer name +Updated IndexerLevel - Indexer Queues May Have Issues to ensure it fires when neccesary but is not too noisy (this may require tuning) +Corrected ForwarderLevel - Bandwidth Throttling Occurring, this alert was not working as expected + +### 1.7 +New macro `splunkadmins_splunkuf_source` +New alert "AllSplunkEnterpriseLevel - TCP or SSL Config Issue" for detecting when the listener ports fail to start on a HF/Indexer +Updated macro splunkindexerhostsvalue to include `splunk_server=` +Updated searches to use (`splunkadmins_splunkd_source`) in brackets so it looks valid when expanded (and allows a future OR/NOT statement to be added before or after with no unexpected side effects) +Updated a few comments and improved some searches to narrow down the required hosts/sources/sourcetypes +Removed unused macro splunkenterprisehostsvalue +Removed hardcoded references to the location of splunkd.log file and replaced with `splunkadmins_splunkd_source` macro +Removed a few unnessary fields/fixed some other minor issues within the file + +### 1.6 +Removed "Splunk Alert failures" and updated "AllSplunkEnterpriseLevel - sendmodalert errors", also updated "Time format has changed" alert to have more clear output via email + +### 1.5 +Updated Splunk Alert Failures alert and the Time format has changed alerts to have more clear output via email +Simplified "Scheduled Searches without a configured earliest and latest time", and "Scheduled searches not specifying an index" + +### 1.4 +Two new alerts LicenseMaster - Duplicated License Situation, DeploymentServer - Unsupported attribute within DS config +Simplified Scheduled Searches without a configured earliest and latest time, and Scheduled searches not specifying an index +Created a macro `splunkadmins_splunkd_source` for Windows users or others using non-standard Splunk installation directories + +### 1.0 to 1.3 +Creation of app, addition of icons and removal of email functionality from the app for Splunk certification purposes + +## Other +Icons made by [Freepik](http://www.freepik.com) from www.flaticon.com is licensed by [Creative Commons BY 3.0](http://creativecommons.org/licenses/by/3.0) + +### Misc testing notes for SearchHeadLevel - Detect changes to knowledge objects +calcfields: +/data/props/calcfields +/servicesNS/admin/search/data/props/calcfields (GUI goes via /manager/ first) +/services/data/props/calcfield +https://localhost:8089/servicesNS/nobody/search/configs/conf-props?count=0 <-- did not work/no data found + +Saved searches: +/servicesNS/nobody/search/configs/conf-savedsearche +/services/configs/conf-savedsearches +/services/saved/searches +/services/admin/savedsearch +/en-US/splunkd/__raw/servicesNS/admin/search/saved/searches/ + +dashboards: +/services/data/ui/views +/en-US/splunkd/__raw/servicesNS/admin/search/data/ui/views +/services/admin/views/ + +fieldaliases: +/servicesNS/admin/search/data/props/fieldaliases (GUI goes via /manager/ first) +/servicesNS/admin/search/data/props//fieldaliases +/services/admin//fieldaliases +/services/configs/conf-props + +field extractions: +/servicesNS/admin/search/data/props/extractions +/services/admin/props-extract +/services/configs/conf-props + +fieldtransforms: +/servicesNS/admin/search/data/transforms/extractions +/services/admin//transforms-extract +/services/configs/conf-transforms + +workflowactions: +/data/ui/workflow-actions +/services/admin//workflow-actions/TestWorkflow +/services/configs/conf-workflow_actions + +sourcetype renaming: +/data/props/sourcetype-rename +/admin//sourcetype-rename +/services/configs/conf-props + +tags: +/configs/conf-tags +/admin/tags +/saved/ntags +/saved/fvtags + +eventtypes: +/saved/eventtypes +/admin//eventtypes +/configs/conf-eventtypes + +navMenu: +/data/ui/nav +/admin/nav/ + +datamodel: +/datamodel/model +/configs/conf-datamodels +/admin//datamodeledit +/admin//datamodel-files + +kvstore: +/storage/collections/config +/configs/conf-collections +/admin//collections-conf + +/configs/conf-viewstates +Skipped + +times: +/data/ui/times +/configs/conf-times +/admin//conf-times + +UI panels: +/data/ui/panels +/configs/conf-panels + +automatic lookups: +/data/transforms/lookups +/admin//transforms-lookup +/services/configs/conf-transforms + +lookup definitions: +/data/props/lookups +/admin//props-lookup +/services/configs/conf-props + +macros: +/configs/conf-macros +/data/macros +/admin/macros diff --git a/apps/SplunkAdmins/default/app.conf b/apps/SplunkAdmins/default/app.conf new file mode 100644 index 00000000..e68de1e9 --- /dev/null +++ b/apps/SplunkAdmins/default/app.conf @@ -0,0 +1,22 @@ +# +# Splunk app configuration file +# + +[install] +is_configured = 0 + +[ui] +is_visible = 1 +label = SplunkAdmins +# allow 9.1 and above to use themes +supported_themes = light,dark + +[launcher] +author = Gareth Anderson +description = Alerts and dashboards as described in the Splunk 2017 conf presentation How did you get so big? +version = 4.0.1 + +[package] +id = SplunkAdmins +check_for_updates = true + diff --git a/apps/SplunkAdmins/default/data/ui/nav/default.xml b/apps/SplunkAdmins/default/data/ui/nav/default.xml new file mode 100644 index 00000000..8b833470 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/nav/default.xml @@ -0,0 +1,583 @@ + diff --git a/apps/SplunkAdmins/default/data/ui/views/ClusterMasterJobs.xml b/apps/SplunkAdmins/default/data/ui/views/ClusterMasterJobs.xml new file mode 100644 index 00000000..59ce6c51 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/ClusterMasterJobs.xml @@ -0,0 +1,107 @@ +
+ +
+ + + + -15m + now + + + + + 2m + +
+ + + Job Count + + + index=_internal `splunkadmins_clustermaster_oshost` sourcetype=splunkd `splunkadmins_splunkd_source` *CMRepJob running job | timechart span=$span$ count by job + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Fixup Jobs + + + index=_internal `splunkadmins_metrics_source` sourcetype=splunkd name=cmmaster_service `splunkadmins_clustermaster_oshost` group=subtask_counts +| timechart max(to_fix_gen), max(to_fix_rep_factor), max(to_fix_search_factor) span=$span$ + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/data_model_rebuild_monitor.xml b/apps/SplunkAdmins/default/data/ui/views/data_model_rebuild_monitor.xml new file mode 100644 index 00000000..c4ae8bdf --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/data_model_rebuild_monitor.xml @@ -0,0 +1,276 @@ +
+ + Originally based on the work on URL https://conf.splunk.com/files/2017/slides/running-enterprise-security-at-capacity-tuning-es-with-data-model-acceleration.pdf modified to work without the macros and corrected the datamodel sizing (and misc tweaks). +
+ + + + | rest /services/admin/summarization by_tstats=t splunk_server=local count=0 +| eval datamodel=replace('summary.id',"DM_".'eai:acl.app'."_","") +| fields datamodel +| sort 100 + datamodel + + datamodel + datamodel + + + + acceleration.earliest_time + acceleration.earliest_time + + | rest /services/configs/conf-datamodels| search title=$dm$ | fields acceleration.earliest_time + 0 + + + true + +
+ + + +

$dm$ data modelconfig

+ +
+
+ + + + + | rest /services/configs/conf-datamodels +| search title=$dm$ +| fields acceleration.earliest_time + @d + now + + + + + + + + + + + + + + + + + + | rest /services/configs/conf-datamodels +| search title=$dm$ +| fields acceleration.backfill_time + @d + now + + + + + + + + + + + + + + + + + + | rest /services/admin/summarization by_tstats=t splunk_server=local count=0 +| eval datamodel=replace('summary.id',"DM_".'eai:acl.app'."_","") +| fields summary.complete, datamodel +| rename summary.complete AS complete +| search datamodel=$dm$ +| eval complete(%)=round(complete*100,1)."%" +| fields complete(%) + 0.000 + + + + + + + + + + + + + + + + + + + | rest /services/configs/conf-datamodels +| search title=$dm$ +| fields acceleration.max_concurrent + @d + now + + + + + + + + + + + + + + + + + + | rest /services/configs/conf-datamodels +| search title=$dm$ +| fields acceleration.max_time + @d + now + + + + + + + + + + + + + + + + + + ```The authors original attempt of | `datamodel("Splunk_Audit", "Datamodel_Acceleration | `drop_dm_object_name("Datamodel_Acceleration")` Just did not appear to show accurate numbers when compared to the filesystem of the indexers +The previous attempt at this number via | rest "/services/admin/introspection--disk-objects--summaries?count=-1" ... worked fine *unless* there were multiple search head GUID's in the introspection data in which case it seems to return 1 set only (resulting in highly inaccurate numbers in some cases) + Now querying the introspection data instead as that provides consistently accurate numbers``` + index=_introspection `indexerhosts` component=summaries "data.name"=*$dm$ +| stats latest(data.total_size) AS size by data.search_head_guid, data.related_indexes_count, data.related_indexes, host +| stats sum(size) AS size + @d + now + + + + + + + + + + + + + + + + + + + + + + +

$dm$ data model acceleration state

+ +
+
+ + + $dm$ event counts - Monitor lag and backfill + + Backfill view over the last 2 hours + + | tstats prestats=t summariesonly=t allow_old_summaries=t count from datamodel=$dm$ by _time span=10s +| timechart count span=10s + -2h + now + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Backfill view over time range of DM acceleration (and -1w) + + |tstats prestats=t allow_old_summaries=t summariesonly=t count from datamodel=$dm$ by _time span=4h| timechart count span=4h + $earliest_token$-1w + now + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $dm$ recent acceleration jobs + + index=_internal source=*scheduler.log _ACCELERATE_DM_*$dm$_ACCELERATE_ | eval scheduled=strftime(scheduled_time,"%c") +| stats values(scheduled) as scheduled, values(scheduled_time) as scheduled_time, list(status) as statuses, values(run_time) as run_time by savedsearch_name sid | sort - scheduled_time | +eval done=if(isnull(run_time),"running","done") +| eval run_time=tostring(if(isnull(run_time),now()-scheduled_time,run_time),"duration") | fields - scheduled_time savedsearch_name sid + @d + now + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/data_model_status.xml b/apps/SplunkAdmins/default/data/ui/views/data_model_status.xml new file mode 100644 index 00000000..d06f3bb2 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/data_model_status.xml @@ -0,0 +1,168 @@ +
+ + Originally based on the work on URL https://conf.splunk.com/files/2017/slides/running-enterprise-security-at-capacity-tuning-es-with-data-model-acceleration.pdf modified to work without the macros (and misc tweaks) +
+ + + + -4h@m + now + + +
+ + + + Skipped searches ($timepicker1.earliest$ to $timepicker1.latest$) + + index=_internal `searchheadhosts` sourcetype=scheduler status="skipped" +| eval type=if(match(savedsearch_name,"^_ACCELERATE_"),"DM","non-DM") +| eval reason = if(isnull(reason) OR reason == "", "none", reason) +| eval combo=type . " - " . reason +| timechart span=5m count by combo + $timepicker1.earliest$ + $timepicker1.latest$ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Deferred & Skipped searches ($timepicker1.earliest$ to $timepicker1.latest$) + + index=_internal `searchheadhosts` sourcetype=scheduler status=continued OR status=skipped +| eval type=if(match(savedsearch_name,"^_ACCELERATE_"),"DM","non-DM") +| eval status=replace(status,"continued","deferred") +| eval combo=type . "-" . status +| timechart span=5m count by combo + $timepicker1.earliest$ + $timepicker1.latest$ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Top Accelerations by Run Duration (on this search head / cluster) + + | rest /services/admin/summarization by_tstats=t splunk_server=local count=0 +| eval datamodel=replace('summary.id',(("DM_" . 'eai:acl.app') . "_"),"") +| join max=1 overwrite=1 type=left usetime=0 datamodel + [| rest /services/data/models splunk_server=local count=0 + | table title acceleration.cron_schedule eai:digest + | rename title as datamodel + | rename "acceleration.cron_schedule" as cron] +| table datamodel eai:acl.app summary.access_time summary.is_inprogress summary.size summary.latest_time summary.complete summary.buckets_size summary.buckets cron summary.last_error summary.time_range summary.id summary.mod_time eai:digest summary.earliest_time summary.last_sid summary.access_count +| rename "eai:digest" as digest, "summary.earliest_time" as earliest, "summary.id" as summary_id, "summary.latest_time" as latest, "summary.time_range" as retention +| rename "eai:acl.app" as app, "summary.access_count" as access_count, "summary.access_time" as access_time, "summary.buckets" as buckets, "summary.buckets_size" as buckets_size, "summary.complete" as complete, "summary.is_inprogress" as is_inprogress, "summary.last_error" as last_error, "summary.last_sid" as last_sid, "summary.mod_time" as mod_time, "summary.size" as size, "summary.*" as "*", "eai:acl.*" as "*" +| sort datamodel +| rename access_count as "Datamodel_Acceleration.access_count", access_time as "Datamodel_Acceleration.access_time", app as "Datamodel_Acceleration.app", buckets as "Datamodel_Acceleration.buckets", buckets_size as "Datamodel_Acceleration.buckets_size", complete as "Datamodel_Acceleration.complete", cron as "Datamodel_Acceleration.cron", datamodel as "Datamodel_Acceleration.datamodel", digest as "Datamodel_Acceleration.digest", earliest as "Datamodel_Acceleration.earliest", is_inprogress as "Datamodel_Acceleration.is_inprogress", last_error as "Datamodel_Acceleration.last_error", last_sid as "Datamodel_Acceleration.last_sid", latest as "Datamodel_Acceleration.latest", mod_time as "Datamodel_Acceleration.mod_time", retention as "Datamodel_Acceleration.retention", size as "Datamodel_Acceleration.size", summary_id as "Datamodel_Acceleration.summary_id" +| rename "Datamodel_Acceleration.access_count" as access_count, "Datamodel_Acceleration.access_time" as access_time, "Datamodel_Acceleration.app" as app, "Datamodel_Acceleration.buckets" as buckets, "Datamodel_Acceleration.buckets_size" as buckets_size, "Datamodel_Acceleration.complete" as complete, "Datamodel_Acceleration.cron" as cron, "Datamodel_Acceleration.datamodel" as datamodel, "Datamodel_Acceleration.digest" as digest, "Datamodel_Acceleration.earliest" as earliest, "Datamodel_Acceleration.is_inprogress" as is_inprogress, "Datamodel_Acceleration.last_error" as last_error, "Datamodel_Acceleration.last_sid" as last_sid, "Datamodel_Acceleration.latest" as latest, "Datamodel_Acceleration.mod_time" as mod_time, "Datamodel_Acceleration.retention" as retention, "Datamodel_Acceleration.size" as size, "Datamodel_Acceleration.summary_id" as summary_id, "Datamodel_Acceleration.*" as "*" +| join max=1 overwrite=1 type=outer usetime=0 last_sid + [| rest splunk_server=* count=0 /services/search/jobs reportSearch=summarize* + | rename sid as last_sid + | fields last_sid,runDuration] +| eval "size(MB)"=round((size / 1048576),1) +| eval "retention(days)"=if((retention == 0),"unlimited",(retention / 86400)) +| eval "complete(%)"=round((complete * 100),1) +| eval "runDuration(s)"=round(runDuration,1) +| sort 18 - runDuration +| table datamodel,runDuration +| eval concurrent_threshold=300 +| eval deferred_threshold=600 +| eval skipped_threshold=900 + 0.000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + All skipped scheduled searches ($timepicker1.earliest$ to $timepicker1.latest$) + + index=_internal `searchheadhosts` sourcetype=scheduler status="skipped" +| table _time status savedsearch_name +| sort - _time + $timepicker1.earliest$ + $timepicker1.latest$ + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/detect_excessive_search_use.xml b/apps/SplunkAdmins/default/data/ui/views/detect_excessive_search_use.xml new file mode 100644 index 00000000..4cbcdfaf --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/detect_excessive_search_use.xml @@ -0,0 +1,120 @@ +
+ + Detect repeated search use for the same search query by a particular user during a period of time +
+ + + + -4h@m + now + + + + + 10m + +
+ + + Searches occurring more often than expected in the audit logs + + Click any line for drilldown per-username + + index=_audit info=granted "search='" NOT "savedsearch_name=\"Threat - Correlation Searches - Lookup Gen\"" NOT "savedsearch_name=\"Bucket Copy Trigger\"" NOT "search='| copybuckets" NOT "search='search index=_telemetry sourcetype=splunk_telemetry | spath" NOT "savedsearch_name=\"_ACCELERATE_*" +| rex ", search='(?P<search>[\S+\s+]+?)', " +| regex search!="\|\s+(rest|inputlookup|makeresults|tstats count AS \"Count of [^\"]+\"\s+ from sid=)" +| rex "apiEndTime='[^,]+, savedsearch_name=\"(?P<savedsearch_name>[^\"]+)" +| eval apiEndTime=strptime(apiEndTime, "'%a %B %d %H:%M:%S %Y'"), apiStartTime=strptime(apiStartTime, "'%a %B %d %H:%M:%S %Y'") +| eval timePeriod=apiEndTime-apiStartTime +| bin _time span=$span$ +| stats count, values(host) AS hostList, values(savedsearch_name) AS savedSearchName, values(ttl) AS ttl by search, user, _time, timePeriod +| eval frequency = ceil((10*60)/timePeriod) +| fillnull frequency +| where count>4 AND count>frequency +| eval timePeriod=tostring(timePeriod,"duration") +| stats sum(count) AS count, max(count) AS "maxCountPerSpan", values(user) AS userList, values(hostList) AS hostList, values(savedSearchName) AS savedSearchName, values(ttl) AS ttl, earliest(_time) AS firstSeen, latest(_time) AS mostRecent, values(timePeriod) AS timePeriods by search +| eval firstSeen=strftime(firstSeen, "%+"), mostRecent=strftime(mostRecent, "%+") +| eval search=substr(search,0,60) +| sort - count + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + $row.userList$ + +
+
+
+ + + Results from access logs for $username$ + + Note: cluster command in use, introspection data may better list all dashboards in use + + index=_internal (sourcetype=splunkd_access (method="GET" AND "/services/search/jobs/export") OR method="POST") OR (sourcetype=splunkd_ui_access method=POST "/report?" OR "/search?" OR "/search/jobs" OR "/servicesNS/*/*/search/jobs" OR "/saved/searches" NOT "/search/parser HTTP" NOT "/user-prefs/data/user-prefs/") OR (sourcetype=splunkd_ui_access method=GET "/app/" NOT "/search HTTP" NOT "/dashboards HTTP" NOT "/alerts HTTP" NOT "/reports HTTP") user IN ($username$) +| cluster t=0.95 showcount=true +| rex field=uri "/servicesNS/[^/]+/(?P<app>[^/]+)" +| rex field=uri "/[^/]+/app/(?P<app>[^/]+)/(?P<dashboard_name>[^/\?]+)" +| sort - cluster_count +| table cluster_count, app, uri_path, user, dashboard_name, clientip, sourcetype + $time.earliest$ + $time.latest$ + + + + +
+
+
+ + + Introspection data for this $username$ + + Click for drilldown + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* data.search_props.user IN ($username$) +| eval mem_used = 'data.mem_used' +| eval app = 'data.search_props.app' +| eval elapsed = 'data.elapsed' +| eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' +| eval mode = 'data.search_props.mode' +| eval user = 'data.search_props.user' +| eval cpuperc = 'data.pct_cpu' +| eval search_head = 'data.search_props.search_head' +| eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' +| eval label=coalesce(label, provenance) +| eval sid='data.search_props.sid' +| rex field=sid "^remote_[^_]+_(?P<sid>.*)" +| eval sid = "'" . sid . "'" +| fillnull search_head value="*" +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as searchStartTime, sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU, max(read_mb) AS read_mb, values(sid) AS sids by type, mode, app, user, label, host, search_head, data.pid +| bin searchStartTime span=1m +| stats dc(sids) AS count, sum(totalCPU) AS totalCPU, sum(mem_used) AS totalMemUsed, max(runtime) AS maxRunTime, avg(runtime) AS avgRuntime, avg(avgCPU) AS avgCPUPerIndexer, sum(read_mb) AS totalReadMB, values(sids) AS sids by searchStartTime, type, mode, app, user, search_head, label +| eval maxduration = tostring(maxRunTime, "duration"), averageduration = tostring(avgRuntime, "duration") +| eval Started = strftime(searchStartTime,"%+") +| table Started, count, user, app, label, averageduration, maxduration, search_head, sids, mode, type + $time.earliest$ + $time.latest$ + + + + + ["Started","count","user","app","label","averageduration","maxduration","mode","type"] + + /app/SplunkAdmins/troubleshooting_resource_usage_per_user_drilldown?form.username=$username$&form.sid=$row.sids$&form.app=$row.app$&form.host=*&form.label=*&form.time.earliest=$time.earliest$&form.time.latest=$time.latest$ + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/heavy_forwarder_analysis.xml b/apps/SplunkAdmins/default/data/ui/views/heavy_forwarder_analysis.xml new file mode 100644 index 00000000..74ab05e5 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/heavy_forwarder_analysis.xml @@ -0,0 +1,634 @@ +
+ + As found on https://drive.google.com/file/d/1zvMKrFkk6wzmeXS1r69-GYfEbIdT_TVX/view from https://conf.splunk.com/files/2024/slides/PLA1509B.pdf / https://conf.splunk.com/files/2024/recordings/PLA1509B.mp4 +
+ + + + -15m + now + + + + + hostname + hostname + + index=_internal group=tcp*_connections sourcetype=splunkd +fwdType=full +| stats count by hostname fwdType + -24h@h + now + + +
+ + + $host1$ Queues-Pipelines Fill perc 90% - if high check thruput not throttled + + + index=_internal sourcetype=splunkd group=queue host=$host1$ (name=tcpin_queue OR name=splunktcpin OR name=parsingqueue OR name=aggqueue OR name=typingqueue OR name=indexqueue OR name=tcpout*) + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | search ingest_pipe=* + | eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) + | eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) + | eval fill_perc=round((curr/max)*100,2) + | eval name=host."-".name."-".ingest_pipe + | timechart span=1m Perc90(fill_perc) by name useother=false limit=0 usenull=f + $time1.earliest$ + $time1.latest$ + 1 + + + index=_internal sourcetype=splunkd host=$host1$ (shutdownhandler complete) OR (loader Splunkd starting build) OR (request state change from=RUN to=SHUTDOWN_SIGNALED) OR (request state change from=SHUTDOWN_IN_PROGRESS to=SHUTDOWN_COMPLETE) OR (loader Splunkd starting build) OR (my GUID is) OR (All pipelines finished) NOT(Queued job) +| transaction startswith=finished endswith=starting maxspan=15min host keepevicted=true +| eval annotation_label=case(searchmatch("new generated"), "first startup", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("NOT starting"), "graceful shutdown", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("starting"), "graceful restart",1=1, "ungraceful restart")." ".host, annotation_category="restart", annotation_color="#FBB117" +| table _time ann* + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ Max Thruput (kbps) by ingest_pipe If =< 256kbps Check FWD is Limited to 256kbps + + + index=_internal host=$host1$ group=thruput name=cooked_output + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | timechart span=1m max(instantaneous_kbps) as max_instantaneous_kbps by ingest_pipe + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + Connected to IDXs based on tcp connections + + + index=_internal group=tcp*_connections sourcetype=splunkd (host=$host1$ OR hostname=$host1$) NOT lastIndexer=None| timechart span=1sec count by lastIndexer usenull=f + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + component="AutoLoadBalancedConnectionStrategy" + + + index=_internal sourcetype=splunkd (host=$host1$) component="AutoLoadBalancedConnectionStrategy" |timechart minspan=1sec count by idx usenull=f useother=false + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ Max KB/sec by index-pipe + + + index=_internal host=$host1$ source="*metrics.log*" group=per_index_thruput NOT series=_* + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | eval index-pipe=series."-".ingest_pipe + | timechart minspan=30sec max(kbps) as "Max KB/sec" by index-pipe useother=f usenull=f limit=0 + $time1.earliest$ + $time1.latest$ + 1 + + + index=_internal sourcetype=splunkd host=$host1$ (shutdownhandler complete) OR (loader Splunkd starting build) OR (request state change from=RUN to=SHUTDOWN_SIGNALED) OR (request state change from=SHUTDOWN_IN_PROGRESS to=SHUTDOWN_COMPLETE) OR (loader Splunkd starting build) OR (my GUID is) OR (All pipelines finished) NOT(Queued job) +| transaction startswith=finished endswith=starting maxspan=15min host keepevicted=true +| eval annotation_label=case(searchmatch("new generated"), "first startup", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("NOT starting"), "graceful shutdown", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("starting"), "graceful restart",1=1, "ungraceful restart")." ".host, annotation_category="restart", annotation_color="#FBB117" +| table _time ann* + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ Max KB/sec by sourcetype-pipe + + + index=_internal host=$host1$ source="*metrics.log*" group=per_sourcetype_thruput NOT series=_* + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | eval sourcetype-pipe=series."-".ingest_pipe + | timechart minspan=30sec max(kbps) as "Max KB/sec" by sourcetype-pipe useother=f usenull=f limit=0 + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + $host1$ Average kbps by ingest_pipe + + + index=_internal host=$host1$ group=thruput name=cooked_output + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | timechart span=1m max(average_kbps) by ingest_pipe + $time1.earliest$ + $time1.latest$ + 1 + + + index=_internal sourcetype=splunkd host=$host1$ (shutdownhandler complete) OR (loader Splunkd starting build) OR (request state change from=RUN to=SHUTDOWN_SIGNALED) OR (request state change from=SHUTDOWN_IN_PROGRESS to=SHUTDOWN_COMPLETE) OR (loader Splunkd starting build) OR (my GUID is) OR (All pipelines finished) NOT(Queued job) +| transaction startswith=finished endswith=starting maxspan=15min host keepevicted=true +| eval annotation_label=case(searchmatch("new generated"), "first startup", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("NOT starting"), "graceful shutdown", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("starting"), "graceful restart",1=1, "ungraceful restart")." ".host, annotation_category="restart", annotation_color="#FBB117" +| table _time ann* + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ Max kbps by ingest_pipe If < 256kbps Check FWD is Limited to 256kbps + + + index=_internal host=$host1$ group=thruput name=cooked_output + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | eval UFdefaultkbps=256 + | timechart span=1m max(UFdefaultkbps) as UFdefaultkbps max(instantaneous_kbps) as max_instantaneous_kbps by ingest_pipe + $time1.earliest$ + $time1.latest$ + 1 + + + index=_internal sourcetype=splunkd host=$host1$ (shutdownhandler complete) OR (loader Splunkd starting build) OR (request state change from=RUN to=SHUTDOWN_SIGNALED) OR (request state change from=SHUTDOWN_IN_PROGRESS to=SHUTDOWN_COMPLETE) OR (loader Splunkd starting build) OR (my GUID is) OR (All pipelines finished) NOT(Queued job) +| transaction startswith=finished endswith=starting maxspan=15min host keepevicted=true +| eval annotation_label=case(searchmatch("new generated"), "first startup", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("NOT starting"), "graceful shutdown", + (searchmatch("complete") OR searchmatch("signalled")) AND searchmatch("starting"), "graceful restart",1=1, "ungraceful restart")." ".host, annotation_category="restart", annotation_color="#FBB117" +| table _time ann* + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ % CPU by pipe_name_processor + + + index=_internal host=$host1$ source="*metrics.log*" sourcetype=splunkd group=pipeline NOT processor=sendout + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | eval pipe_name_processor=ingest_pipe."-".name."-".processor + | timechart minspan=30s per_second(eval(cpu_seconds*100)) AS pctCPU by pipe_name_processor useother=false limit=0 + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ executes by pipe_name_processor + + + index=_internal host=$host1$ source="*metrics.log*" sourcetype=splunkd group=pipeline NOT processor=sendout + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "0") + | eval pipe_name_processor=ingest_pipe."-".name."-".processor + | timechart minspan=30s per_second(executes) AS executes by pipe_name_processor useother=false limit=0 + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + $host1$ cpu_seconds, executes name processor + + + index=_internal sourcetype=splunkd host=$host1$ Metrics TERM(group=pipeline) NOT TERM(processor=sendout) NOT TERM(processor=readerin) +| bucket _time span=1m +| fields cpu_seconds, executes name processor +| eval name_rocessor=name."-".processor +|timechart sum(cpu_seconds) as cpu_seconds sum(executes) as executes by name_rocessor useother=f + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Check if TCPout Groups and Queues Dropping Events + + + index=_internal host=$host1$ component=TcpOutputProc sourcetype=splunkd "TcpOutputProc - Queue for group * has" +[| tstats min(_time) as earliest where (index=_internal sourcetype=splunkd)] +[| tstats max(_time) as latest where (index=_internal sourcetype=splunkd)] +| rex field=event_message "Queue for group (?<tcpout_group>.*) has (?<queue_action>.*) events" +| eval group_action=tcpout_group."-".queue_action +| stats sparkline count by tcpout_group queue_action + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + +
+
+ + The TCP output processor has paused the data flow + + + index=_internal host=$host1$ component=TcpOutputProc sourcetype=splunkd event_message="The TCP output processor has paused the data flow*" +| rex field=event_message "Forwarding to output group (?<tcpout_group>.*) has been blocked for (?<blocked_for_seconds>.*) seconds" +| stats sparkline(max(blocked_for_seconds),5m) as blocked_for_seconds last(_time) as _time min(blocked_for_seconds) as min_blocked_seconds max(blocked_for_seconds) as max_blocked_seconds by tcpout_group host + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + +
+
+ + $host1$ Blocking + + + index=_internal sourcetype=splunkd host=LOG-HF11.myengie.com (log_level=ERROR AND ("TcpInputProc - Error encountered for connection from" AND " Local side shutting down")) OR (log_level=INFO AND blocked=true) + | eval combined=max_size_kb."-".current_size_kb."-".current_size."-".largest_size +| stats dc(_time) as count values(max_size_kb) as max_size_kb values(current_size_kb) as current_size_kb values(current_size) as current_size values(largest_size) as largest_size earliest(_time) as firsttime latest(_time) as lasttime by host name combined +| convert ctime(lasttime) as LastTime, ctime(firsttime) as FirstTime +| addcoltotals labelfield=host label=Total +| fields - firsttime lasttime combined | where count>0 | sort - count + $time1.earliest$ + $time1.latest$ + 1 + + + +
+
+
+ + + $host1$ Queues : Current Size v Max Size (kb) + + + index=_internal source="*metrics.log*" group=queue host=$host1$ +| timechart values(current_size_kb) AS current_size_kb values(max_size_kb) as max_size_kb by name + $time1.earliest$ + $time1.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
\ No newline at end of file diff --git a/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name.xml b/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name.xml new file mode 100644 index 00000000..e8d0383d --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name.xml @@ -0,0 +1,318 @@ +
+ +
+ + + + -4h@m + now + + + + + 1m + + + + `heavyforwarderhosts` + +
+ + + Parsing Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=parsingqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + + + + + Aggregation Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=aggqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Typing Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=typingqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Index Queue Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=indexqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TCPOut Queue Sizes + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=tcpout_*) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Blocked Forwarder Queues + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue max_size_kb>0 | stats count(eval(isnotnull(blocked))) AS blockedCount, count by name, host, _time | eval percBlocked=(100/count)*blockedCount | eval hostQueue = host . "_" . name | where percBlocked>0 | timechart limit=50 useOther=false span=$span$ avg(percBlocked) by hostQueue + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TcpOut KB per second per forwarder + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=thruput name=cooked_output OR name=uncooked_output + | timechart useother=false span=$span$ limit=20 per_second(kb) by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forced closures on restart + + A potential indicator of data loss + + | tstats count where index=_internal sourcetype=splunkd $hosts$ `splunkadmins_splunkd_source` TERM("Forcing") groupby _time, host span=1s | timechart sum(count) by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forwarders that have stopped listening on all ports + + + index=_internal $hosts$ sourcetype=splunkd `splunkadmins_splunkd_source` TERM(WARN) TERM(Stopping) +| timechart count by host span=1m limit=99 + -24h@h + now + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name_v8.xml b/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name_v8.xml new file mode 100644 index 00000000..4a05330e --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name_v8.xml @@ -0,0 +1,318 @@ +
+ +
+ + + + -4h@m + now + + + + + 1m + + + + `heavyforwarderhosts` + +
+ + + Parsing Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=parsingqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + + + + + Aggregation Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=aggqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Typing Queue Fill Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=typingqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Index Queue Size + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=indexqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TCPOut Queue Sizes + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=tcpout_*) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=20 useother=false span=$span$ max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Blocked Forwarder Queues + + + index=_internal $hosts$ `splunkadmins_metrics_source` sourcetype=splunkd group=queue max_size_kb>0 | stats count(eval(isnotnull(blocked))) AS blockedCount, count by name, host, _time | eval percBlocked=(100/count)*blockedCount | eval hostQueue = host . "_" . name | where percBlocked>0 | timechart limit=50 useOther=false span=$span$ avg(percBlocked) by hostQueue + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TcpOut KB per second per forwarder + + + | tstats prestats=true sum(PREFIX(kb=)) where index=_internal $hosts$ TERM(group=thruput) TERM(name=cooked_output) OR TERM(name=uncooked_output) sourcetype=splunkd `splunkadmins_metrics_source` groupby host, _time span=1s + | timechart aligntime=latest useother=false span=$span$ limit=20 per_second(kb=) by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forced closures on restart + + A potential indicator of data loss + + | tstats prestats=true count where index=_internal sourcetype=splunkd $hosts$ `splunkadmins_splunkd_source` TERM("Forcing") groupby _time, host span=1s | timechart count by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forwarders that have stopped listening on all ports + + + index=_internal $hosts$ sourcetype=splunkd `splunkadmins_splunkd_source` TERM(WARN) TERM(Stopping) +| timechart count by host span=1m limit=99 + -24h@h + now + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/hec_performance.xml b/apps/SplunkAdmins/default/data/ui/views/hec_performance.xml new file mode 100644 index 00000000..a61059af --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/hec_performance.xml @@ -0,0 +1,217 @@ +
+ + Based on the original version from https://github.com/camrunr/hec_perf_report/blob/master/hec_perf_report.xml + + index=_introspection (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_hec_metrics_source` http_event_collector_token +| bucket _time span=$dd_span$ +| stats sum(data.num_of_events) as Events sum(data.total_bytes_received) as Bytes by _time data.token_name + $timepicker.earliest$ + $timepicker.latest$ + 1 + $refreshinterval$ + + + index=_introspection (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_hec_metrics_source` http_event_collector_token +| bucket _time span=$dd_span$ +| stats sum(data.num_of_events) as Events sum(data.total_bytes_received) as Bytes by _time host +| eval host=replace(host,"\..*","") + $timepicker.earliest$ + $timepicker.latest$ + 1 + $refreshinterval$ + +
+ + + + -4h@m + now + + + + + 1 minute + 5 minutes + 30 minutes + 1 hour + 1 day + 1min + + + + 15 + + + + 300 + +
+ + + Events/sec by host + + + timechart limit=$hostcount$ span=$dd_span$ per_second(Events) as Events/sec by host + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bytes/sec by host + + + timechart limit=$hostcount$ span=$dd_span$ per_second(Bytes) as Bytes/sec by host + + + + + + + + + + + + + Events/sec by input/group + + + timechart span=$dd_span$ per_second(Events) as Events/sec by data.token_name + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bytes/sec by input/group + + + timechart span=$dd_span$ per_second(Bytes) as Bytes/sec by data.token_name + + + + + + + + + + + + + HEC Batching Efficiency + + + $refreshinterval$ + index=_introspection (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_hec_metrics_source` http_event_collector_token +| eval EpR='data.num_of_events'/'data.num_of_requests' +| bucket _time span=5m +| stats sum(data.num_of_events) as events avg(EpR) as events_per_POST sum(data.num_of_requests) as reqs sum(data.total_bytes_received) as Bytes by _time data.token_name +| eval reqs_per_sec=reqs/300, bytes_per_post=Bytes/reqs +| rename data.token_name as token_name +| stats sum(eval(Bytes/1024/1024)) as MBytes sum(events) as Events p50(events_per_POST) as events_per_post p50(bytes_per_post) as bytes_per_post p90(reqs_per_sec) as posts_per_sec by token_name +| eval MBytes = round(MBytes, 2), events_per_post=round(events_per_post,2), bytes_per_post=round(bytes_per_post,2), posts_per_sec=round(posts_per_sec,2) +| sort - posts_per_sec + $timepicker.earliest$ + $timepicker.latest$ + + + + + + [#DC4E41,#DC4E41,#F8BE34,#53A051] + 0,5,10 + + + [#53A051,#F8BE34,#DC4E41] + 10,50 + + + + + + + + +
+
+
+ + + If useACK is in use num_of_requests_waiting_ack is high then this can be an issue (HEC tokens with useACK will stop allowing data through) + + + $refreshinterval$ + index=_introspection (`indexerhosts`) OR (`heavyforwarderhosts`) data.series=http_event_collector data.num_of_requests_waiting_ack=* sourcetype=http_event_collector_metrics +| timechart minspan=2m max(data.num_of_requests_waiting_ack) AS num_of_requests_waiting_ack + $timepicker.earliest$ + $timepicker.latest$ + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/indexer_data_spread.xml b/apps/SplunkAdmins/default/data/ui/views/indexer_data_spread.xml new file mode 100644 index 00000000..efc32054 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/indexer_data_spread.xml @@ -0,0 +1,170 @@ +
+ + Indexer Data Spread +
+ + + + -24h@h + now + + + + + + -24h@h + now + + +
+ + + Spread of data across the indexers + + + | tstats count WHERE index="*" by splunk_server _time span=10m | timechart span=10m sum(count) by splunk_server + $thetime.earliest$ + $thetime.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Indexed data in KB per second per indexer + + + (index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=thruput name=index_thruput) | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* | timechart minspan=30s per_second(kb) by host + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forwarders and Throughput (from monitoring console) + + + index=_internal sourcetype=splunkd group=tcpin_connections (connectionType=cooked OR connectionType=cookedSSL) fwdType=* guid=* `indexerhosts` | timechart minspan=30s dc(guid) as forwarder_count, per_second(kb) as tcp_KBps | rename forwarder_count as "Forwarder Count", tcp_KBps as "Throughput (KB/s)" + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Incoming TCP Queues + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue name=splunktcpin OR name=tcpin_cooked_pqueue + | eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) + | eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) + | eval fill_perc=round((curr/max)*100,2) + | timechart minspan=30s Median(fill_perc) AS "fill_percentage" by host + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name.xml b/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name.xml new file mode 100644 index 00000000..8391adb6 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name.xml @@ -0,0 +1,336 @@ +
+ +
+ + + + -4h@m + now + + +
+ + + Parsing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=parsingqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + + + + + Aggregation Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=aggqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Typing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=typingqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Indexing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=indexqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Shows any replication queue issues that may slowdown/prevent the queues from clearing at the indexer level + + The replication queue appears to directly relate to the indexing queue, any blockage of the indexing queue will then block the replication queue and temporarily slow data ingestion. The replication queue appears to be extremely sensitive to the other indexers indexing queue so it can be a useful measure of an issue... + + index=_internal `indexerhosts` "replication queue for " "full" OR "has room now" sourcetype=splunkd `splunkadmins_splunkd_source` +| rename peer AS guid +| join guid + [| rest /services/search/distributed/peers + | table guid peerName] +| transaction bid guid endswith="has room now" keeporphans=true +| timechart span=1m count, max(duration) AS duration by peerName + -60m@m + now + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Blocked Indexing Queues + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue | stats count(eval(isnotnull(blocked))) AS blockedCount, count by name, host, _time | eval percBlocked=(100/count)*blockedCount | eval hostQueue = host . "_" . name | timechart useOther=false span=10m avg(percBlocked) by hostQueue + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TCPIn Queue Sizes (Max) + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=splunktcpin OR name=tcpin_cooked_pqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Thruput Per Indexer + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=thruput name=index_thruput + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* + | eval combined = host . "_pipe_" . ingest_pipe + | timechart useother=false span=1m limit=14 per_second(kb) by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forced closures on restart + + A potential indicator of data loss + + | tstats count where index=_internal sourcetype=splunkd `indexerhosts` `splunkadmins_splunkd_source` TERM("Forcing") groupby _time, host span=1s | timechart sum(count) by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name_v8.xml b/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name_v8.xml new file mode 100644 index 00000000..b239374b --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/indexer_max_data_queue_sizes_by_name_v8.xml @@ -0,0 +1,334 @@ +
+ +
+ + + + -4h@m + now + + +
+ + + Parsing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=parsingqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + + + + + Aggregation Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=aggqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Typing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=typingqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Indexing Queue Fill Size + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=indexqueue) +| eval name=case(name=="aggqueue","2 - Aggregation Queue", + name=="indexqueue", "4 - Indexing Queue", + name=="parsingqueue", "1 - Parsing Queue", + name=="typingqueue", "3 - Typing Queue") + | eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m Max(fill_perc) by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + + + Shows any replication queue issues that may slowdown/prevent the queues from clearing at the indexer level + + The replication queue appears to directly relate to the indexing queue, any blockage of the indexing queue will then block the replication queue and temporarily slow data ingestion. The replication queue appears to be extremely sensitive to the other indexers indexing queue so it can be a useful measure of an issue... + + index=_internal `indexerhosts` "replication queue for " "full" OR "has room now" sourcetype=splunkd `splunkadmins_splunkd_source` +| rename peer AS guid +| join guid + [| rest /services/search/distributed/peers + | table guid peerName] +| transaction bid guid endswith="has room now" keeporphans=true +| timechart span=1m count, max(duration) AS duration by peerName + -60m@m + now + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Blocked Indexing Queues + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue | stats count(eval(isnotnull(blocked))) AS blockedCount, count by name, host, _time | eval percBlocked=(100/count)*blockedCount | eval hostQueue = host . "_" . name | timechart useOther=false span=10m avg(percBlocked) by hostQueue + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TCPIn Queue Sizes (Max) + + + index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue (name=splunktcpin OR name=tcpin_cooked_pqueue) +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=* +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) +| eval fill_perc=round((curr/max)*100,2) +| eval combined = host . "_pipe_" . ingest_pipe +| timechart limit=14 useother=false span=1m max(fill_perc) by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Thruput Per Indexer + + + | tstats prestats=true sum(PREFIX(kb=)) where index=_internal `indexerhosts` TERM(group=thruput) TERM(name=index_thruput) `splunkadmins_metrics_source` sourcetype=splunkd `indexerhosts` groupby PREFIX(name=), host, _time span=1s + | timechart aligntime=latest useother=false span=10m limit=14 per_second(kb=) AS tcp_KBps by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Forced closures on restart + + A potential indicator of data loss + + | tstats prestats=true count where index=_internal sourcetype=splunkd `indexerhosts` `splunkadmins_splunkd_source` TERM("Forcing") groupby _time, host span=1s | timechart count by host + + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/issues_per_sourcetype.xml b/apps/SplunkAdmins/default/data/ui/views/issues_per_sourcetype.xml new file mode 100644 index 00000000..c9197b72 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/issues_per_sourcetype.xml @@ -0,0 +1,276 @@ +
+ + Detect time-parsing, event breaking or truncation issues for a particular sourcetype. Please note that the investigation query is something you can copy & paste into a new search window in Splunk to find example events, it does not work 100% of the time... +
+ + + + -60m + now + + + + + +
+ + + Failure to parse timestamp correctly + + Timestamp parsing has failed, note that if the null queue is in use this can give false alarms + + ```Timestamp parsing has failed, and it doesn't appear to be related to the event been broken due to having too many lines, that is a separate alert that may trigger a timestamp parsing issue (excluded from this alert as that issue needs to be resolved first) +Please note that you may see this particular warning on data that is sent to the nullQueue using a transforms.conf. Obviously you won't see this in the index but you will see the warning because the time parsing occurs before the transforms.conf occurs +This alert now checks for at least 2 failures, and header entries can often trigger 2 entries in the log files about timestamp parsing failures... +Finally one strange edge case is a newline inserted into the log file (by itself with no content before/afterward) can trigger the warning but nothing will get indexed, multiline_event_extra_waittime, time_before_close and EVENT_BREAKER can resolve this edge case``` +index=_internal sourcetype=splunkd ("Failed to parse timestamp" "Defaulting to timestamp of previous event") OR "Breaking event because limit of " OR "outside of the acceptable time window" (`indexerhosts`) OR (`heavyforwarderhosts`) $sourcetype$ +| bin _time span=1m +| eval host=data_host, source=data_source, sourcetype=data_sourcetype +| rex "source::(?P<source>[^|]+)\|host::(?P<host>[^|]+)\|(?P<sourcetype>[^|]+)" +| eventstats count(eval(isnotnull(data_host))) AS hasBrokenEventOrTuncatedLine, count(eval(searchmatch("outside of the acceptable time window"))) AS outsideTimewindow by _time, host, source, sourcetype +| where hasBrokenEventOrTuncatedLine=0 AND isnull(data_host) AND NOT searchmatch("outside of the acceptable time window") +```To investigate further we want the previous timestamp that Splunk used for the event in question, that way we can see what it looks like in raw format...``` +| rex "Defaulting to timestamp of previous event \((?P<previousTimeStamp>[^)]+)" +| eval previousTimeStamp=strptime(previousTimeStamp, "%a %b %d %H:%M:%S %Y") +| stats count, min(_time) AS firstSeen, max(_time) AS mostRecent, first(previousTimeStamp) AS recentExample, sum(outsideTimewindow) AS outsideTimewindow by host, sourcetype, source +| where count>0 +| stats sum(count) AS count, min(firstSeen) AS firstSeen, max(mostRecent) AS mostRecent, first(recentExample) AS recentExample, values(source) AS sourceList, sum(outsideTimewindow) AS outsideTimewindow by host, sourcetype +| eval invesEnd=recentExample+1 +| eval invesDataSource=sourceList +| eval invesDataSource=if(mvcount(invesDataSource)>1,mvjoin(invesDataSource,"\" OR source=\""),invesDataSource) +| eval invesDataSource = "source=\"" + invesDataSource + "\"" +| eval invesDataSource = replace(invesDataSource, "\\\\", "\\\\\\\\") +| eval investigationQuery="```The investigation query may find zero data if the data was sent to the null queue by a transforms.conf as the time parsing occurs before the transforms occur. If this source/sourcetype has a null queue you may need to exclude it from this alert. Note that the host= can be inaccurate if host overrides are in use in transforms.conf, if this query finds no results remove host=...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" " . invesDataSource . " earliest=" . recentExample . " latest=" . invesEnd . " | eval indextime=strftime(_indextime, \"%+\")" +| eval mostRecent=strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+") +| eval outsideAcceptableTimeWindow=if(outsideTimewindow!=0,"Timestamp parsing failed due to been outside the acceptable time window","No") +| fields - recentExample, invesEnd, invesDataSource, outsideTimewindow +| sort - count + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+ + + Invalid parsed time + + The timestamp parsing did run but the timestamp found did not match previous events so the time parsing may need a review + + ```The timestamp parsing did run but the timestamp found did not match previous events so the time parsing may need a review``` +index=_internal sourcetype=splunkd (`indexerhosts`) OR (`heavyforwarderhosts`) +"outside of the acceptable time window. If this timestamp is correct, consider adjusting" +OR "is too far away from the previous event's time" +OR "is suspiciously far away from the previous event's time" $sourcetype$ +| rex "source::(?P<source>[^|]+)\|host::(?P<host>[^|]+)\|(?P<sourcetype>[^|]+)" +| rex "Context: source=(?P<source>[^|]+)\|host=(?P<host>[^|]+)\|(?P<sourcetype>[^|]+)" +```The goal of this part of the search was to obtain the messages that are relating to this particular host/source/sourcetype, however since the message includes a time we cannot uses values(message) without getting a huge number of values, therefore we use cluster to obtain the unique values. Since we want the original start/end times we use labelonly=true``` +| cluster labelonly=true +| eval message=coalesce(message,event_message) +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen, first(message) AS message by host, source, sourcetype, cluster_label +```While 'A possible timestamp match (...) is outside of the acceptable time window' and 'Time parsed (...) is too far away from the previous event's time', result in the current indexing time been used, the 'Accepted time (...) is suspiciously far away from the previous event's time' is accepted and therefore we need to expand the investigation query time to include this time range as well!``` +| rex field=message "Accepted time \((?P<acceptedTime>[^\)]+)" +| eval acceptedTime=strptime(acceptedTime, "%a %b %d %H:%M:%S %Y") +| eval firstSeen=if(acceptedTime<firstSeen,acceptedTime,firstSeen) + ```Now that we have the first message for each labelled cluster, we now take all relevant message per host/source/sourcetype``` +| stats values(acceptedTime) AS acceptedTime, sum(count) AS count, min(firstSeen) AS firstSeen, max(lastSeen) AS lastSeen, values(message) AS message by host, source, sourcetype +| eval invesEnd=if(lastSeen=firstSeen,round(lastSeen+1),round(lastSeen)), invesStart=floor(firstSeen) +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\") +| eval investigationQuery="```Please note that this query may need to be narrowed down further before running it, this is an example only...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" source=\"" . invesDataSource . "\" earliest=" . invesStart . " latest=" . invesEnd . " | eval indextime=strftime(_indextime, \"%+\")" +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +| table host, source, sourcetype count, firstSeen, lastSeen, message, investigationQuery +| sort - count + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+ + + Multiple time formats one sourcetype + + Normally this alert advises that a sourcetype is been used by multiple unique types of data (i.e. it should be more than one sourcetype), one way to fix this is at the universal forwarder / inputs.conf sourcetype= setting + + ```This search detects when the time format has changed within the files 1 or more times, the time format per sourcetype should be consistent``` +index=_internal "DateParserVerbose - Accepted time format has changed" sourcetype=splunkd (`indexerhosts`) OR (`heavyforwarderhosts`) $sourcetype$ +| rex "source(?:=|::)(?P<source>[^|]+)\|host(?:=|::)(?P<host>[^|]+)\|(?P<sourcetype>[^|]+)" +| eval message=coalesce(message,event_message) +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen by host, source, sourcetype, message +| eval invesMaxTime=if(firstSeen=lastSeen,lastSeen+1,lastSeen) +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\") +| eval potentialInvestigationQuery="```If no results are found, prepend the earliest=/latest= with _index_ (eg _index_earliest=...) and expand the timeframe searched over, as the parsed timestamps from the data does not have to exactly match the time the warnings appeared...``` index=* sourcetype=\"" . sourcetype . "\" source=\"" . invesDataSource . "\" host=" . host . " earliest=" . firstSeen . " latest=" . invesMaxTime . " | eval start=substr(_raw, 0, 30) | cluster field=start" +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +| fields - invesMaxTime, invesDataSource +| sort - count + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+ + + Events Broken due to size limits + + The event that came in was greater than the maximum number of lines that were configured, therefore it was broken into multiple events...(use a LINE_BREAKER or adjust MAX_EVENTS) + + ```The event that came in was greater than the maximum number of lines that were configured, therefore it was broken into multiple events... +Also refer to the monitoring console Indexing -> Inputs -> Data Quality``` +index=_internal "AggregatorMiningProcessor - Breaking event because limit of" sourcetype=splunkd data_sourcetype=$sourcetype$ +| rex "Breaking event because limit of (?P<curlimit>\d+)" +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, count by data_sourcetype, data_host, curlimit +| eval longerThan=curlimit-1 +| eval invesLatest = if(mostRecent==firstSeen,mostRecent+1,mostRecent) +| rename data_sourcetype AS sourcetype, data_host AS host +| eval investigationQuery="```If no results are found prepend the earliest=/latest= with _index_ (eg _index_earliest=...) and expand the timeframe searched over, as the parsed timestamps from the data does not have to exactly match the time the warnings appeared...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" linecount>" . longerThan . " earliest=" . firstSeen . " latest=" . invesLatest +| fields - firstSeen, longerThan, invesLatest +| eval mostRecent=strftime(mostRecent, "%+") +| sort - count + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+ + + Data found to be from the future + + Note: hardcoded to look for data in the past week. Find events which have future based dates on them, any results are a problem + + index=* earliest=+5m latest=+5y sourcetype=$sourcetype$ +| eval ahead=abs(now() - _time) +| eval indextime=_indextime +| bin span=1d indextime +| eval timeToLookBack=now()-(60*60*24*7) +| stats avg(ahead) as averageahead, max(_time) AS maxTime, min(_time) as minTime, count, first(timeToLookBack) AS timeToLookBack by sourcetype, index, indextime +| where indextime>timeToLookBack AND averageahead > 1000 +| eval averageahead =tostring(averageahead, "duration") +| eval invesMaxTime=if(minTime=maxTime,maxTime+1,maxTime) +| eval investigationQuery="index=" . index . " sourcetype=\"" . sourcetype . "\" earliest=" . minTime . " latest=" . invesMaxTime . " _index_earliest=" . timeToLookBack . " +| eval indextime=strftime(_indextime, \"%+\")" +| eval indextime=strftime(indextime, "%+"), maxTime = strftime(maxTime, "%+"), minTime = strftime(minTime, "%+") +| table sourcetype, index, averageahead, indextime, minTime, maxTime, count, investigationQuery +| sort - count + -24h@h + now + 1 + + + + + + + + + +
+
+
+ + + Old data coming ingested recently + + Hardcoded to find data which was sent in during the past week + + | tstats max(_time) AS mostRecentlySeen, max(_indextime) AS mostRecentlyIndexed, min(_time) AS earliestSeen, min(_indextime) AS earliestIndexTime , count + where _index_earliest=-7d, earliest=-300d, latest=-7d, sourcetype=$sourcetype$ + groupby source, sourcetype, index, host +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\"), invesLatestTime=mostRecentlySeen+1, invesLatestIndexTime=mostRecentlyIndexed+1 +| eval investigationQuery="```Narrow down to the older part of the timeline after this query runs to see the potential issue...``` index=" . index . " source=\"" . invesDataSource . "\" sourcetype=\"" . sourcetype . "\" host=" . host . " earliest=" . earliestSeen . " latest=" . invesLatestTime . " _index_earliest=" . earliestIndexTime . " _index_latest=" . invesLatestIndexTime . " | eval indextime=strftime(_indextime, \"%+\")" +| eval mostRecentlySeen=strftime(mostRecentlySeen, "%+"), mostRecentlyIndexed=strftime(mostRecentlyIndexed, "%+") +| sort index, host, sourcetype +| table index, source, sourcetype, host, mostRecentlySeen, mostRecentlyIndexed, count, investigationQuery + -12d@d + now + 1 + + + + + + + + +
+
+
+ + + Truncated data + + The line was truncated due to length, the TRUNCATE setting may need tweaking (or it may be just bad data coming in) + + ```The line was truncated due to length, the TRUNCATE setting may need tweaking (or it may be just bad data coming in) +Also refer to the Monitoring Console, Indexing -> Inputs -> Data Quality +If you are in a (very) performance sensitive environment you might want to remove the rex/eval lines for the data_host field and let the admin update the investigation query manually``` +index=_internal "Truncating line because limit of" sourcetype=splunkd data_sourcetype=$sourcetype$ (`heavyforwarderhosts`) OR (`indexerhosts`) +| rex "Truncating line because limit of (?P<curlimit>\d+) bytes.*with a line length >= (?P<approxlinelength>\S+)" +| rex field=data_host "(?P<data_host>[^\.]+)" +| eval data_host=data_host . "*" +| stats min(_time) AS firstSeen, max(_time) AS lastSeen, count, avg(approxlinelength) AS avgApproxLineLength, max(approxlinelength) AS maxApproxLineLength, values(data_host) AS hosts by data_sourcetype, curlimit +| rename data_sourcetype AS sourcetype +| eval hostList=if(mvcount(hosts)>1,mvjoin(hosts," OR host="),hosts) +| eval hostList="host=" . hostList +| eval avgApproxLineLength = round(avgApproxLineLength) +| eval invesLastSeen=if(firstSeen==lastSeen,lastSeen+1,lastSeen) +| eval firstSeen=firstSeen-10 +| eval invesLastSeen=invesLastSeen+10 +| eval investigationQuery="```Find examples where the truncation limit has been reached. The earliest/latest time is based on the warning messages in the Splunk logs, they may need customisation!``` index=* sourcetype=" . sourcetype . " " . hostList . " earliest=" . firstSeen . " latest=" . invesLastSeen . " | where len(_raw)=" . curlimit +| sort - count +| eval lastSeen=strftime(lastSeen, "%+") +| table sourcetype, curlimit, count, avgApproxLineLength, maxApproxLineLength, lastSeen, investigationQuery +| where count>0 + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app.xml b/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app.xml new file mode 100644 index 00000000..705d7329 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app.xml @@ -0,0 +1,103 @@ +
+ + List of knowledge objects per app +
+ + + app + app + + | rest /services/apps/local search="disabled=0" count=0 f=title splunk_server=local +| rename title as app +| table app + -24h@h + now + + + + + all + datamodel + calcfields + macros + type + type + + | rest "/servicesNS/-/$app$/directory" count=0 splunk_server=local +| search eai:acl.app=$app$ +| rename eai:type AS type +| search type!="macros" ```macros only appears in really new versions of Splunk via the directory endpoint, so assume it doesn't exist in this query``` +| stats count by type +| fields - count + -24h@h + now + + all + * + +
+ + + Knowledge object summary + + + | rest "/servicesNS/-/$app$/directory" count=0 splunk_server=local +| search eai:acl.app=$app$ +| eval updatedEpoch=strptime(updated,"%Y-%m-%dT%H:%M:%S%:z") +| rename eai:type AS type, eai:acl.app AS app, eai:location AS location +| append [ rest splunk_server=local /servicesNS/-/$app$/datamodel/model count=0 f=updated f=eai:appName | rename eai:appName AS app | eval type="datamodel" ] +| append [ | rest splunk_server=local /servicesNS/-/$app$/data/props/calcfields count=0 | eval type="calcfields" | rename eai:acl.app AS app] +| append [ | rest splunk_server=local /servicesNS/-/$app$/configs/conf-macros count=0 | rename eai:appName AS app | eval type="macros"] +| fillnull location value="N/A" +| search app=$app$ +| stats count by type, app, location + -4h@m + now + 1 + + + + + + + + + +
+
+
+ + + Knowledge Objects by app semi-detailed + + Click any row for the drilldown... + + | rest "/servicesNS/-/$app$/directory" count=0 splunk_server=local +| search eai:acl.app=$app$ +| eval updatedEpoch=strptime(updated,"%Y-%m-%dT%H:%M:%S%:z") +| rename eai:type AS type, eai:acl.app AS app, eai:location AS location +| append [ rest splunk_server=local /servicesNS/-/$app$/datamodel/model count=0 f=updated f=eai:appName | rename eai:appName AS app | eval type="datamodel" ] +| append [ | rest splunk_server=local /servicesNS/-/$app$/data/props/calcfields count=0 | eval type="calcfields" | rename eai:acl.app AS app] +| append [ | rest splunk_server=local /servicesNS/-/$app$/configs/conf-macros count=0 | rename eai:appName AS app | eval type="macros"] +| fillnull location value="N/A" +| search app=$app$, type=$type$ +| stats values(title) AS names, values(updated) AS updated by eai:acl.owner, eai:acl.sharing, type +| rename eai:acl.sharing AS sharing, eai:acl.owner AS owner + -4h@m + now + 1 + + + + + + + + + + /app/SplunkAdmins/knowledge_objects_by_app_drilldown?form.app=$app$&form.type=$row.type$&form.sharing=$row.sharing$&form.owner=$row.owner$ + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app_drilldown.xml b/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app_drilldown.xml new file mode 100644 index 00000000..48877ba7 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/knowledge_objects_by_app_drilldown.xml @@ -0,0 +1,94 @@ +
+ + List of knowledge objects per app by user/sharing level +
+ + + app + app + + | rest /services/apps/local search="disabled=0" count=0 f=title splunk_server=local +| rename title as app +| table app + -24h@h + now + + + + + all + datamodel + calcfields + macros + type + type + + | rest "/servicesNS/-/$app$/directory" count=0 splunk_server=local +| search eai:acl.app=$app$ +| rename eai:type AS type +| stats count by type +| fields - count + -24h@h + now + + all + * + + + + * + + + + All + app + user (private) + global + * + * + + + + * + + + + Yes + No + * + +
+ + + Knowledge object summary + + + | rest "/servicesNS/-/$app$/directory" count=0 splunk_server=local +| search eai:acl.app=$app$ +| eval updatedEpoch=strptime(updated,"%Y-%m-%dT%H:%M:%S%:z") +| rename eai:type AS type, eai:acl.app AS app, eai:location AS location +| append [ rest splunk_server=local /servicesNS/-/$app$/datamodel/model count=0 f=updated f=eai:appName | rename eai:appName AS app | eval type="datamodel" ] +| append [ | rest splunk_server=local /servicesNS/-/$app$/data/props/calcfields count=0 | eval type="calcfields" | rename eai:acl.app AS app] +| append [ | rest splunk_server=local /servicesNS/-/$app$/configs/conf-macros count=0 | rename eai:appName AS app | eval type="macros"] +| fillnull disabled +| search app=$app$ type=$type$ title=$name$ eai:acl.sharing=$sharing$ disabled=$disabled$ eai:acl.owner=$owner$ +| fillnull location value="N/A" +| rename title AS name, eai:acl.owner AS owner, eai:acl.sharing AS sharing +| eval disabled=case(disabled==0,"false",disabled==1,"true",1==1,"Unknown") +| table name, description, disabled, owner, sharing, type, updated + -4h@m + now + 1 + + + + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/lookup_audit.xml b/apps/SplunkAdmins/default/data/ui/views/lookup_audit.xml new file mode 100644 index 00000000..c009c624 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/lookup_audit.xml @@ -0,0 +1,178 @@ +
+ + Dashboard for displaying lookup table files on a Search Head. Created to easily identify large tables which might disrupt Splunk uptime. Created by Discovered Intelligence -- https://discoveredintelligence.ca, modifications by Gareth Anderson + + | rest /servicesNS/nobody/$appselection_rest$/data/lookup-table-files splunk_server=local +| rename eai:acl.app as appname +| regex appname=^$appselection$$ +| dedup appname +| map maxsearches=5000 search=" | rest /servicesNS/-/$appselection_rest$/admin/file-explorer/$splunk_dir|u$%2Fapps%2F$$appname$$%2Flookups splunk_server=local | eval appname=\"$$appname$$\"" + +
+ + + Show All Lookups + Exclude Blacklisted Lookups + Show Only Blacklisted Lookups + + + + + + + + + + + + + * + + + NonBlackList + NonBlackList + + + + All + appname + appname + + | rest /servicesNS/-/-/data/lookup-table-files splunk_server=local +| where like(title,"%csv") +| rename eai:acl.app as appname +| dedup appname +| sort appname + -15m + now + + + + - + + + $value$ + + + + + + All + Yes + No + * + * + + + + /opt/splunk/etc + +
+ + + Lookup Files by App + + + | rex field=title "[\\\\/]apps[\\\\/](?P<App>.+)[\\\\/]lookups" +| sort - lastModifiedTime +| eval "Last Modified" = strftime(lastModifiedTime,"%b %d, %Y %H:%M"), fileSize_MB=round((fileSize/1024),3) +| fillnull value=0.000 fileSize_MB +| fields App name fileSize_MB "Last Modified" title +| rex field=title "(?<title>apps.*)$" +| search $blacklist$ +| join type=left name + [| rest /servicesNS/nobody/$appselection_rest$/data/lookup-table-files splunk_server=local + | rename title AS name + | fields + name author] +| eval private_lookup="No" +| append + [| rest /servicesNS/-/$appselection_rest$/data/lookup-table-files splunk_server=local + | regex eai:data="[\\\\/]users[\\\\/]$appselection$[\\\\/][^\\\\/]+[\\\\/]lookups[/\\\\]" + | rename eai:acl.app as appname, eai:userName AS user + | search appname=* + | dedup appname + | map maxsearches=5000 search=" | rest /servicesNS/-/$appselection_rest$/admin/file-explorer/$splunk_dir|u$%2Fusers%2F$$user$$%2F$$appname$$%2Flookups splunk_server=local" + | rex field=title "[\\\\/]users[\\\\/]$appselection$[\\\\/](?<App>.+)[\\\\/]lookups[\\\\/]" + | sort - lastModifiedTime + | eval "Last Modified" = strftime(lastModifiedTime,"%b %d, %Y %H:%M"), fileSize_MB=round((fileSize/1024),3) + | fillnull value=0.000 fileSize_MB + | fields App name fileSize_MB "Last Modified" title + | rex field=title "(?<title>users.*)$" + | search $blacklist$ + | join type=left name + [| rest /servicesNS/-/$appselection_rest$/data/lookup-table-files splunk_server=local + | regex eai:data="$splunk_dir$[\\\\/]users[\\\\/]$appselection$[\\\\/]" + | rename title AS name + | fields + name author] + | eval private_lookup="Yes" + ] +| rename title AS path +| search private_lookup="$priv_lookup$" +| sort - fileSize_MB + + + + + + + + + +
+
+
+ + + Lookup Subdirectories by App + + Note: blacklist does not work for this panel and the last modified is directory modification date. If the author is blank then no matching lookup definition of type geo was found. Finally, as per the open ideas, the sub-directories under the lookups directory are never reaped by Splunk as of 8.0.3, it is upto the administrator to remove them as required. Also note they are not blacklisted from the knowledge bundle to the search peers, and finally they are created when the geom command is used so can be different per-search head! + + | eval last_modified = strftime(lastModifiedTime,"%b %d, %Y %H:%M") +| search hasSubNodes=1 +| map maxsearches=5000 search=" | rest /servicesNS/-/$appselection_rest$/admin/file-explorer/$splunk_dir|u$%2Fapps%2F$$appname$$%2Flookups%2F$$name$$ splunk_server=local | eval last_modified=\"$$last_modified$$\"" +| rex field=title "(?P<path>[^/\\\\]+[/\\\\](?P<App>[^/\\\\]+)[/\\\\][^/\\\\]+[/\\\\](?P<dirname>[^/\\\\]+))[/\\\\][^/\\\\]+$" +| stats sum(fileSize) AS fileSize, values(last_modified) AS "Last Modified" by dirname, App, path +| append + [| rest /servicesNS/-/$appselection_rest$/data/lookup-table-files splunk_server=local + | regex eai:data="$splunk_dir$[\\\\/]users[/\\\\][^/\\\\]+[/\\\\]$appselection$[\\\\/]" + | rename eai:acl.app as appname, eai:userName AS user + | search appname=* + | dedup appname + | map maxsearches=5000 search=" | rest /servicesNS/-/$appselection_rest$/admin/file-explorer/$splunk_dir|u$%2Fusers%2F$$user$$%2F$$appname$$%2Flookups splunk_server=local | eval appname=\"$$appname$$\", user=\"$$user$$\"" + | search NOT ignoreme="true" + | search hasSubNodes=1 + | eval last_modified = strftime(lastModifiedTime,"%b %d, %Y %H:%M") + | fillnull last_modified + | map maxsearches=5000 search=" | rest /servicesNS/-/$appselection_rest$/admin/file-explorer/$splunk_dir|u$%2Fusers%2F$$user$$%2F$$appname$$%2Flookups%2F$$name$$ splunk_server=local | eval last_modified=\"$$last_modified$$\"" + | rex field=title "(?P<path>([^/\\\\]+[/\\\\]){2}(?P<App>[^/\\\\]+)[/\\\\][^/\\\\]+[/\\\\](?P<dirname>[^/\\\\]+))[/\\\\][^/\\\\]+$" + | stats sum(fileSize) AS fileSize, values(last_modified) AS "Last Modified" by dirname, App, path ] +| eval fileSize_MB=round((fileSize/1024),3) +| table App, dirname, fileSize_MB, "Last Modified" path +| join type=left dirname + [| rest /servicesNS/-/$appselection_rest$/data/transforms/lookups splunk_server=local search="type=geo" f=title + | fields + dirname author] +| sort - fileSize_MB + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/lookups_in_use_finder.xml b/apps/SplunkAdmins/default/data/ui/views/lookups_in_use_finder.xml new file mode 100644 index 00000000..903ffd73 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/lookups_in_use_finder.xml @@ -0,0 +1,117 @@ +
+ + Attempt to detect if a lookup file in question is in use within Splunk +
+ + + + + + All + - + - + app + app + + | rest /services/apps/local search="disabled=0" count=0 f=title splunk_server=local +| rename title as app +| table app + -15m + now + + + + + + -60m@m + now + + +
+ + + Dashboard or Scheduled Search lookups + + + | makeresults +| eval filename="$lookup_name$", lookupDefName=null() +| fields - _time +| append + [| rest splunk_server=local "/servicesNS/-/$app$/data/transforms/lookups" f=eai:* f=filename f=title f=updated + | search filename="$lookup_name$" + | fields title + | rename title AS lookupDefName ] +| tail 1 +| fillnull lookupDefName value="youwontfindthisone" +| appendpipe +[ | map + [| rest /servicesNS/-/$app$/data/ui/views splunk_server=local f=eai:* f=label f=title + | fields eai:acl.app, label, title, updated, eai:acl.owner, eai:data + | regex eai:data="(input|output)?lookup\s+($lookup_name$|$$lookupDefName$$)" + | eval type="dashboard" + | fields - eai:data ] ] +| appendpipe [ | map + [| rest /servicesNS/-/$app$/saved/searches splunk_server=local f=eai:* f=title f=search f=updated + | fields eai:acl.owner, title, search, updated, eai:acl.app + | regex search="(input|output)?lookup\s+($lookup_name$|$$lookupDefName$$)" + | eval type="report" + | fields - search ]] +| where isnotnull('eai:acl.app') +| eval searchedApp="$app$" +| where 'eai:acl.app'==searchedApp OR "$app$"=="-" +| fields - filename, lookupDefName +| rename eai:acl.app AS app, eai:acl.owner AS owner + -5m + now + 1 + + + + + + + + + +
+
+
+ + + Audit Logs Check (note no app context available) + + + | makeresults +| eval filename="$lookup_name$", lookupDefName=null() +| fields - _time +| append + [| rest splunk_server=local "/servicesNS/-/$app$/data/transforms/lookups" f=eai:* f=filename f=title f=updated + | search filename="$lookup_name$" + | fields title + | rename title AS lookupDefName ] +| tail 1 +| fillnull lookupDefName value="youwontfindthisone" +| appendpipe + [ map + [ search index=_audit "info=granted" "search='search " $lookup_name$ search_id!="'ta_*" + | rex ", search='(?P<search>[\S+\s+]+?)', " + | regex search="(input|output)?lookup\s+($lookup_name$|$$lookupDefName$$)" + | fields user, search, search_id, savedsearch_name] ] +| where isnotnull(user) +| table user, search, search_id, savedsearch_name + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/rolled_buckets_by_index.xml b/apps/SplunkAdmins/default/data/ui/views/rolled_buckets_by_index.xml new file mode 100644 index 00000000..44b17862 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/rolled_buckets_by_index.xml @@ -0,0 +1,230 @@ +
+ + A very simple dashboard to determine which index is rolling the largest number of buckets and therefore may require some level of tuning +
+ + + + -3d + @d + + + + + 3 + 7 + 14 + 30 + 60 + 7 + - + d + +
+ + + Number of buckets rolled from hot to warm + + Buckets rolled per day per index, top 15 indexes + + index=_internal "HotBucketRoller" sourcetype=splunkd `splunkadmins_splunkd_source` `indexerhosts` "finished moving" +| bin _time span=24h +| chart limit=15 useother=false count by _time, idx + $time.earliest$ + $time.latest$ + 1 + + + + + $click.name2$ + + + + + + + Buckets with largest timespan + + Buckets sorted by longest average time period (often indicates a timestamp parsing issue as large time periods trigger the buckets to roll early) + + | dbinspect index=* +| eval timePeriod=(endEpoch-startEpoch)/60/60/24 +| stats avg(timePeriod) AS avgTimePeriod, max(timePeriod) AS maxTimePeriod by index +| where avgTimePeriod>5 +| sort - avgTimePeriod + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + $click.value$ + +
+
+
+ + + License usage for index $indexname$ + + Click on an index above for this drilldown to show which the license usage by a particular index + + index=_internal `licensemasterhost` `splunkadmins_license_usage_source` idx=$indexname$ +| bin _time span=24h +| stats sum(b) AS totalB by idx, _time +| eval totalB=totalB/1024/1024/1024 +| chart avg(totalB) AS totalGB by _time, idx + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Bucket Info From DBInspect + + Show the length of time for the average bucket from this particular index + + | dbinspect index=$indexname$ +| eval timePeriod=(endEpoch-startEpoch)/60/60/24 +| stats avg(timePeriod) AS avgTimePeriod, min(timePeriod) AS minTimePeriod, max(timePeriod) AS maxTimePeriod, max(sizeOnDiskMB) AS maxSizeMB, avg(sizeOnDiskMB) AS avgSizeMB by index +| append + [| rest `splunkindexerhostsvalue` /services/data/indexes + | search title=$indexname$ + | head 1 + | table maxDataSize ] + $days$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+ + + Sourcetype info for $indexname$ + + Click on any sourcetype to drilldown to the historic data in the past week for that sourcetype... + + | tstats count where index=$indexname$ groupby sourcetype +| sort - count + $days$ + $time.latest$ + 1 + + + + + + + + + + $click.value2$ + +
+
+
+ + + Historic data for index $indexname$ indexed in the past $days$ + + Find data indexed in the past $days$ days that is at least 30 days old for sourcetype $sourcetype$ in index $indexname$ + + index=$indexname$ sourcetype=$sourcetype$ _index_earliest=-7d earliest=-300d latest=-30d +| eval indextime=strftime(_indextime, "%+") + $days$ + now + 1 + + + + + + + + + + + + + + + + + + Future based data + + Future based data for sourcetype $sourcetype$ in index $indexname$ indexed in the past $days$ days + + index=$indexname$ sourcetype=$sourcetype$ earliest=+5m latest=+5y _index_earliest=$days$ +| eval indextime=strftime(_indextime, "%+") + -5m + now + 1 + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/search_head_scheduledsearches_distribution.xml b/apps/SplunkAdmins/default/data/ui/views/search_head_scheduledsearches_distribution.xml new file mode 100644 index 00000000..54c7ee08 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/search_head_scheduledsearches_distribution.xml @@ -0,0 +1,86 @@ +
+ + Number of scheduler searches per search head +
+ + + + -24h@h + now + + +
+ + + Searches per search head + + + index=_internal `searchheadhosts` sourcetype=scheduler status=delegated_remote_completion | timechart count by member_label + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Scheduled searches starting later than 100 seconds after the scheduled time (mostly harmless as the now time period relates to the original scheduled time) + + + None + _ACCELERATE + __NOEXCLUSION__ + __NOEXCLUSION__ + + + + * + * + + + + noexclusion + noexclusion + + + + index=_internal `searchheadhosts` sourcetype=scheduler app=* scheduled_time=* savedsearch_name!=$exclude$ user=$userequals$ user!=$usernotequalto$ | eval time=strftime(_time,"%Y-%m-%d %H:%M:%S") | eval delay_in_start = (dispatch_time - scheduled_time) | where delay_in_start>100 | eval scheduled_time=strftime(scheduled_time,"%Y-%m-%d %H:%M:%S") | eval dispatch_time=strftime(dispatch_time,"%Y-%m-%d %H:%M:%S") | rename time AS endTime | table host,savedsearch_name,delay_in_start, scheduled_time, dispatch_time, endTime, run_time, status, user, app | sort -delay_in_start | dedup host,savedsearch_name,delay_in_start + $time.earliest$ + $time.latest$ + 1 + + + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/smartstore_stats.xml b/apps/SplunkAdmins/default/data/ui/views/smartstore_stats.xml new file mode 100644 index 00000000..6dacf28d --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/smartstore_stats.xml @@ -0,0 +1,178 @@ +
+ + Also refer to https://github.com/camrunr/s2_traffic_report/blob/master/s2_traffic_report.xml for an alternative view of SmartStore downloads/uploads. To determine which searches are causing cache misses refer to the SearchHeadLevel - SmartStore cache misses reports in this app. Note that the cache misses combined will require the search to complete while the indexing tier version can catch an in-progress search +
+ + + + -60m@m + now + + + + + All + download + upload + * + + + + + + + + All Indexers + `indexerhosts` + +
+ + + Also refer to + SmartStore S2S Traffic report for an alternative dashboard view or SearchHeadLevel - SmartStore cache misses combined or SmartStore cache misses - remote_searches to find the searches that are triggering the cache misses + + + + + Upload/download latency + + + index=_internal $host$ TERM(status=succeeded) OR TERM(status=failed) sourcetype=splunkd `splunkadmins_splunkd_source` TERM(action=$action$) +| rangemap field=kb under_300=0-307200 300_700=307201-716800 700_1000=716801-1024000 default=over1000 +| eval combined = action . "_" . range +| timechart avg(elapsed_ms) AS avg_elapsed_ms, max(elapsed_ms) AS max_elapsed_ms by combined + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Upload/download thruput + + + index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` $host$ TERM(status=succeeded) OR TERM(status=failed) TERM(action=$action$) +| timechart sum(eval(kb/1024)) AS MB by action + $time.earliest$ + $time.latest$ + + + + + + + + + + CacheManager Queued download count + + + ```Relates to [cachemanager] max_concurrent_downloads in server.conf. Thanks to Splunk support for the original version of this search``` index=_internal $host$ `splunkadmins_metrics_source` TERM(group=cachemgr_download) sourcetype=splunkd queued +| timechart partial=f limit=50 avg(queued) AS avg_queued by host +| eval ceiling=20 + $time.earliest$ + $time.latest$ + + + + + + + + + + CacheManager hits/misses + + + + index=_internal $host$ `splunkadmins_metrics_source` sourcetype=splunkd group=cachemgr_bucket TERM(cache_hit=*) OR TERM(cache_miss=*) +| timechart sum(cache_hit) as Hits sum(cache_miss) as Misses + + $time.earliest$ + $time.latest$ + + + + + + + + + + Excessive cachemanager downloads + + + ```Thanks to Splunk support for the original version of this search, similar version available in the monitoring console...``` index=_internal $host$ `splunkadmins_splunkd_source` sourcetype=splunkd CacheManager TERM(action=download) TERM(status=succeeded) TERM(download_set=*) +| rex field=cache_id ">*\|(?<index_name>.*)~.*~.*\|" +| eval identifier=(cache_id + host) +| stats count by identifier, index_name +| stats count(eval(count>1)) as duplicate_downloads, sum(count) as all_downloads + count(eval(count>8)) as excessive_duplicate_downloads by index_name +| eval duplicate_percent=if(all_downloads=0,0,round((duplicate_downloads/all_downloads)*100,2)) +| fields index_name, duplicate_percent all_downloads duplicate_downloads excessive_duplicate_downloads +| rename custom_index as Index, duplicate_percent as "Repeat Download %", all_downloads as "All Downloads", duplicate_downloads as "Repeated" + $time.earliest$ + $time.latest$ + + + + + + + + + + CacheManager downloads by age/index + + + ```Thanks to Splunk support for the original version of this search``` index=_audit $host$ TERM(action=remote_bucket_download) TERM(info=completed) +| eval gbps=kb/1024/1024 +| eval age=round((now()-earliest_time)/60/60/24) +| bucket span=30 age +| rex field=cache_id "^[^\|]+\|(?P<index_name>[^~]+)~[^~]+~[^~]+" +| eval age_index = age. " - ".index_name +|timechart span=60s sum(gbps) by age_index limit=10 useother=f usenull=f + $time.earliest$ + $time.latest$ + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_data_balance_tuning.xml b/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_data_balance_tuning.xml new file mode 100644 index 00000000..190873ed --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_data_balance_tuning.xml @@ -0,0 +1,150 @@ +
+ + Attempt to measure data balance between HF's, original version by Brett Adam's, similar to splunk_forwarder_output_tuning +
+ + + + -60m@m + now + + + + + `heavyforwarderhosts` + + + + output_name + output_name + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=tcpout_connections) +| rex field=name "(?P<output_name>[^:]+)" +| stats count by output_name +| fields output_name + -60m@m + now + + +
+ + + Scatter Line Chart of sum by destination IP + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` component=Metrics TERM(group=tcpout_connections) name=$output_group$* +| timechart span=1m sum(kb) by destIp limit=50 +| fillnull value=0 +| untable _time server kb +| eval t=_time-now() +| table t kb + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + Total KB by destination IP + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` component=Metrics TERM(group=tcpout_connections) name=$output_group$* +| timechart span=1m sum(kb) by destIp limit=100 +| fillnull value=0 + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Standard Deviation + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` component=Metrics TERM(group=tcpout_connections) name=$output_group$* +| timechart span=1m sum(kb) by destIp limit=50 +| fillnull value=0 +| untable _time destIp kb +| stats avg(kb) as avg stdev(kb) as stdev by _time +| eval devperc = stdev/avg*100 +| table _time devperc + $time.earliest$ + $time.latest$ + + + + + + + + + + + Data Sum + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` component=Metrics TERM(group=tcpout_connections) kb>0 name=$output_group$* +| bin span=1m _time +| stats sum(kb) as kb by destIp _time +| sort _time +| streamstats sum(kb) as sumkb by destIp +| timechart span=1m max(sumkb) by destIp useother=false limit=50 + $time.earliest$ + $time.latest$ + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_output_tuning.xml b/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_output_tuning.xml new file mode 100644 index 00000000..93b5613d --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/splunk_forwarder_output_tuning.xml @@ -0,0 +1,136 @@ +
+ + Splunk forwarder to indexer output tuning +
+ + + + -60m@m + now + + + + + `heavyforwarderhosts` + + + + output_name + output_name + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=tcpout_connections) +| rex field=name "(?P<output_name>[^:]+)" +| stats count by output_name +| fields output_name + -60m@m + now + + + + + Yes + No + +
+ + + Data output per-second + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=tcpout_connections) name=$output_group$* +| rex field=name "(?<output_name>[^:]+)" +| search output_name=$output_group$ +| fillnull ingest_pipe +| eval combined = output_name . "_" . ingest_pipe +| bin _time span=1m +| stats sum(kb) AS totalkb by combined, host, _time +| eval totalkb=totalkb/60 +| eval combined = $split_by$ . combined +| timechart limit=99 avg(totalkb) AS avgkb, perc95(totalkb) AS perc95kb, min(totalkb) AS minkb by combined + $time.earliest$ + $time.latest$ + + + + + + + + + + Destination count + + + index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` group=tcpout_connections name=$output_group$* +| rex field=name "(?<output_name>[^:]+)" +| search output_name=$output_group$ +| bin _time span=5m +| stats dc(destIp) AS destination_count by output_name, host, _time +| stats min(destination_count) AS min_destination_count, avg(destination_count) AS avg_destination_count by output_name + -24h@h + now + + + +
+
+
+ + + Data output std deviation + + + ```Credit to Brett Adams``` index=_internal $host$ sourcetype=splunkd `splunkadmins_metrics_source` component=Metrics TERM(group=tcpout_connections) name=$output_group$* +| rex field=name "(?P<destination>[^:]+)" +| search destination=$output_group$* +| timechart span=1m sum(kb) by destIp limit=50 +| fillnull value=0 +| untable _time destIp kb +| stats avg(kb) as avg stdev(kb) as stdev by _time +| eval dev_perc = stdev/avg*100 +| table _time dev_perc + $time.earliest$ + $time.latest$ + + + + + + + + + + Dashboard info + + +

Purpose of destination count table? metrics.log only records the tcpout data *if* the connection is open at the time the metrics.log writes, so the count is to sanity-check that the numbers of connections matches the number of forwarders on the backend (this will happen with the below outputs.conf settings combined with regular data flow)

+
+

Asynchronous load balancing (docs.splunk.com)

+

Splunk Asynchronous Forwarding (Lightning-fast data ingestor)

+

Purpose of the data output per-second timechart? The current goal is to get close to switching indexers every second for an output group (per-pipeline), note that this will result in more open connections to indexers so only really works if this is deployed to a moderate number of intermediate forwarders (HF's or similar). Note that you want to do this with autoLBVolume, if you lower autoLBFrequency to a very short time period you may result in un-even data balance due to switching frequently when forwarding smaller volumes of data. In my testing so far it would appear that aiming above the average kb/s for the autoLBVolume appears to work well, going too low doesn't work well in my testing so far

+

Please read the linked article for information on these settings, note that when using async forwarding the open file descriptor usage is higher than without async forwarding as the connections are held open by forwarders. So this works great on an intermediate forwarding tier, this may not work so well with a very large number of forwarders

+

Also note that the maxQueueSize should not be below 10MB (10MB minimium size)

+

If you are using an AWS NLB, you may wish to refer to this newer post Asynchronous forwarding with NLB

+

Finally while this also works on UF's, there are some reasons why you may want to consider HF's if you are running an intermediate tier, answers post Wrongly merged Events/permanently blocked tcpout queue with Intermediate Universal Forwarder

+

Finally you may want to refer to Slow indexer/receiver detection capability

+

What config is used to achieve the above?

+

outputs.conf file based on 1MB/s +

maxQueueSize = 10MB +

+

+ #autoLBVolume is set below 1/5 of the maxQueueSize due to changes post 7.3.6 which will hopefully be documented in the very near future, minimum 10MB queue +

+

+ autoLBVolume = 1024000 +

+

+ autoLBFrequency = 10 +

+

+ connectionTTL = 300 +

+ + +
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/splunk_introspection_io_stats.xml b/apps/SplunkAdmins/default/data/ui/views/splunk_introspection_io_stats.xml new file mode 100644 index 00000000..9aa69ba3 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/splunk_introspection_io_stats.xml @@ -0,0 +1,170 @@ +
+ +
+ + + + -24h@h + now + + + + + `indexerhosts` + + + + 1m + +
+ + + data.avg_total_ms (average wait time) + + perc95 total io service time per host (sum of all disks avg_total_ms) + + index=_introspection sourcetype=splunk_resource_usage component=IOStats $hosts$ data.device=nvme* +| eval avg_total_ms = 'data.avg_total_ms', comment="You may wish to change sum(avg_total_ms) for perc95 or similar depending on your setup..." +| bin _time span=$span$ +| stats sum(avg_total_ms) AS avg_total_ms by host, _time +| timechart span=$span$ partial=f limit=99 perc95(avg_total_ms) AS avg_total_ms by host + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + data.read_ps/data.write_ps + + perc95 reads/writes per second (IOPS) + + index=_introspection sourcetype=splunk_resource_usage component=IOStats $hosts$ data.device=nvme* +| eval reads_ps = 'data.reads_ps', writes_ps = 'data.writes_ps' +| bin _time span=$span$ +| stats sum(reads_ps) AS reads_ps, sum(writes_ps) AS writes_ps by host, _time +| timechart span=$span$ partial=f limit=99 perc95(reads_ps) AS reads_ps, perc95(writes_ps) AS writes_ps by host + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + data.read_kb_ps/data.write_kb_ps + + perc95 read KB/write KB per second + + index=_introspection sourcetype=splunk_resource_usage component=IOStats $hosts$ data.device=nvme* +| eval reads_kb_ps = 'data.reads_kb_ps', writes_kb_ps = 'data.writes_kb_ps' +| bin _time span=$span$ +| stats sum(reads_kb_ps) AS reads_kb_ps, sum(writes_kb_ps) AS writes_kb_ps by host, _time +| timechart span=$span$ partial=f limit=99 perc95(reads_kb_ps) AS reads_kb_ps, perc95(writes_kb_ps) AS writes_kb_ps by host + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu.xml b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu.xml new file mode 100644 index 00000000..9dc8b902 --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu.xml @@ -0,0 +1,461 @@ +
+ +
+ + + + -4h@h + @h + + + + + + + + 10m + 30m + 60m + 120m + 4h + 60m + + + + + -1h@h + @h + + +
+ + + Search Count Per Application + + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | eval app = 'data.search_props.app' +| chart count by app + $CPUtimetoken.earliest$ + $CPUtimetoken.latest$ + + + + + + + + + + + + + + + + + + + + + + + + + + + + CPU Usage By Application (point in time across all indexers) + + CPU is approx CPU% at any point in time + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | eval app = 'data.search_props.app' | eval cpuperc = 'data.pct_cpu' | bin _time span=1m | stats sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU by data.pid, host, _time, app | stats sum(totalCPU) AS totalCPU, sum(avgCPU) AS avgCPU by app | addinfo | eval overThisManyMinutes = round((info_max_time-info_min_time)/60) | eval CPUPercUsed = round(avgCPU/overThisManyMinutes) | fields - totalCPU, info* overThisManyMinutes, avgCPU + $CPUtimetoken.earliest$ + $CPUtimetoken.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + Searches Running Per Indexer + + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | chart count by host + $CPUtimetoken.earliest$ + $CPUtimetoken.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + Search Related CPU By Indexer + + CPU is approx CPU% at any point in time + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | eval cpuperc = 'data.pct_cpu' | bin _time span=1m | stats sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU by data.pid, host, _time| stats sum(totalCPU) AS totalCPU, sum(avgCPU) AS avgCPUTotal by host | addinfo | eval overThisManyMinutes = round((info_max_time-info_min_time)/60) | eval CPUPercUsed = round(avgCPUTotal/overThisManyMinutes) | fields - info* overThisManyMinutes, totalCPU, avgCPUTotal + $CPUtimetoken.earliest$ + $CPUtimetoken.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TotalCPU By Indexer And Application + + This is not % CPU, a rough guide only + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | eval app = 'data.search_props.app' | eval cpuperc = 'data.pct_cpu' | chart sum(cpuperc) AS totalCPU by host, app + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Search count by app, indexer + + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* | eval app = 'data.search_props.app' +| chart count by app, host + $time_tok.earliest$ + $time_tok.latest$ + + + + + + + + Usage by non system users - per $interval$ block of time + + CPU is total measured amount, memory is maximum memory usage by process, 100 is 1 CPU core + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* "data.search_props.user"!=admin "data.search_props.user"!=splunk-system-user +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') | eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' | eval label=coalesce(label, provenance) +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as searchStartTime, sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU, max(read_mb) AS read_mb by type, mode, app, user, label, host, search_head, data.pid +| bin searchStartTime span=$interval$ +| stats sum(totalCPU) AS totalCPU, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, avg(runtime) AS avgRuntime, sum(avgCPU) AS avgCPUAcrossAllIndexers, sum(read_mb) AS totalReadMB by searchStartTime, type, mode, app, user +| eval totalduration = tostring(totalRuntime, "duration"), averageduration = tostring(avgRuntime, "duration") +| eval Started = strftime(searchStartTime,"%+") +| eval avgCPUAcrossAllIndexers = round(avgCPUAcrossAllIndexers) +| sort - totalCPU, totalMemUsed +| eval totalCPU=tostring(totalCPU,"commas"), avgCPUAcrossAllIndexers=tostring(avgCPUAcrossAllIndexers,"commas") +| fields Started, totalMemUsed, user, app, mode, type, averageduration, totalduration, totalCPU, avgCPUAcrossAllIndexers, totalReadMB + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + if($click.name2$="app", $click.value2$, "*" + if($click.name2$="user", $click.value2$, "" + /app/SplunkAdmins/troubleshooting_indexer_cpu_drilldown?form.app=$app$&form.user=$user$ + +
+
+
+ + + Usage by system users per $interval$ block of time + + CPU is totalMeasuredAmount, memory is maximum memory usage by process, 100 is 1 CPU core + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* "data.search_props.user"=admin OR "data.search_props.user"=splunk-system-user +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') | eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' | eval label=coalesce(label, provenance) +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as searchStartTime, sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU, max(read_mb) AS read_mb by type, mode, app, user, label, host, search_head, data.pid +| bin searchStartTime span=$interval$ +| stats sum(totalCPU) AS totalCPU, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, avg(runtime) AS avgRuntime, sum(avgCPU) AS avgCPUAcrossAllIndexers, sum(read_mb) AS totalReadMB by searchStartTime, type, mode, app, user +| eval totalduration = tostring(totalRuntime, "duration"), averageduration = tostring(avgRuntime, "duration") +| eval Started = strftime(searchStartTime,"%+") +| eval avgCPUAcrossAllIndexers = round(avgCPUAcrossAllIndexers) +| sort - totalCPU, totalMemUsed +| eval totalCPU=tostring(totalCPU,"commas"), avgCPUAcrossAllIndexers=tostring(avgCPUAcrossAllIndexers,"commas") +| fields Started, totalMemUsed, user, app, mode, type, averageduration, totalduration, totalCPU, avgCPUAcrossAllIndexers, totalReadMB + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + if($click.name2$="app", $click.value2$, "*" + if($click.name2$="user", $click.value2$, "" + /app/SplunkAdmins/troubleshooting_indexer_cpu_drilldown?form.app=$app$&form.user=$user$ + +
+
+
+ + + CPU used per indexer per search label, CPU measured at point in time + + + _ACCELERATE* + No Exclusion + __DONTEXCLUDE__ + + + + avgCPU, memory + totalCPU, memory + duration, totalCPU + duration, avgCPU + totalAVGCPU, totalMemUsed + totalAVGCPU, totalMemUsed + + + CPU is approx CPU% at any point in time, memory is maximum memory usage by process, 100 is 1 CPU core + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* NOT ("data.search_props.label"=$labelExclusion$) +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' +| eval label=coalesce(label, provenance) +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as Started, sum(cpuperc) AS totalCPU, max(read_mb) AS read_mb, avg(cpuperc) AS avgCPU by type, mode, app, user, label, host, data.pid +| stats sum(avgCPU) AS totalAVGCPU, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, sum(read_mb) AS totalReadMB, sum(totalCPU) AS totalCPU by Started, type, "mode", app, user, label, host +| eval totalMemUsed = round(totalMemUsed, 2) +| eval Started=strftime(Started,"%+") +| eval duration = tostring(totalRuntime, "duration") +| eval avgCPU = round(totalAVGCPU) +| sort - $sort$ +| eval totalCPU=tostring(totalCPU,"commas"), avgCPU=tostring(avgCPU,"commas") +| fields - totalRuntime, totalAVGCPU + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + if($click.name2$="app", $click.value2$, "*" + if($click.name2$="user", $click.value2$, "" + /app/SplunkAdmins/troubleshooting_indexer_cpu_drilldown?form.app=$app$&form.user=$user$ + +
+
+
+ + + Most Expensive Non System Queries with CPU measured at point in time + + + avgCPU, memory + totalCPU, memory + duration, totalCPU + duration, avgCPU + totalAVGCPU, totalMemUsed + totalAVGCPU, totalMemUsed + + + CPU is approx CPU% at any point in time, memory is maximum memory usage by process, 100 is 1 CPU core + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* "data.search_props.user"!=admin "data.search_props.user"!=splunk-system-user +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' +| eval label=coalesce(label, provenance) +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as Started, sum(cpuperc) AS totalCPU, max(read_mb) AS read_mb, avg(cpuperc) AS avgCPU by type, mode, app, user, label, host, data.pid +| stats sum(avgCPU) AS totalAVGCPU, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, sum(read_mb) AS totalReadMB, sum(totalCPU) AS totalCPU by Started, type, "mode", app, user, label, host +| eval totalMemUsed = round(totalMemUsed, 2) +| eval Started=strftime(Started,"%+") +| eval duration = tostring(totalRuntime, "duration") +| eval avgCPU = round(totalAVGCPU) +| sort - $sort2$ +| eval totalCPU=tostring(totalCPU,"commas"), avgCPU=tostring(avgCPU,"commas") +| fields - totalRuntime, totalAVGCPU + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + if($click.name2$="app", $click.value2$, "*" + if($click.name2$="user", $click.value2$, "" + /app/SplunkAdmins/troubleshooting_indexer_cpu_drilldown?form.app=$app$&form.user=$user$ + +
+
+
+ + + CPU used on a per SID basis + + + avgCPU, memory + totalCPU, memory + duration, totalCPU + duration, avgCPU + totalRuntime, totalCPU + + + + _ACCELERATE* + No Exclusion + __DONTEXCLUDE__ + + + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* NOT ("data.search_props.label"=$labelExclusion2$) +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval read_mb = 'data.read_mb' +| eval sid='data.search_props.sid' +| eval provenance='data.search_props.provenance' +| eval label=coalesce(label, provenance) +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as Started, sum(cpuperc) AS totalCPU, max(read_mb) AS read_mb, avg(cpuperc) AS avgCPUPerMinute by type, mode, app, user, label, host, data.pid, sid +| stats sum(avgCPUPerMinute) AS totalAVGCPUPerMinute, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, sum(read_mb) AS totalReadMB, sum(totalCPU) AS totalCPU by Started, type, "mode", app, user, label, host, sid, data.pid +| eval totalMemUsed = round(totalMemUsed, 2) +| eval Started=strftime(Started,"%+") +| eval duration = tostring(totalRuntime, "duration") +| eval avgCPU = round(totalAVGCPUPerMinute) +| sort - $sort3$ +| eval totalCPU=tostring(totalCPU,"commas"), avgCPU=tostring(avgCPU,"commas") +| fields - totalRuntime, totalAVGCPUPerMinute, sid + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + + + if($click.name2$="app", $click.value2$, "*" + if($click.name2$="user", $click.value2$, "" + /app/SplunkAdmins/troubleshooting_indexer_cpu_drilldown?form.app=$app$&form.user=$user$ + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu_drilldown.xml b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu_drilldown.xml new file mode 100644 index 00000000..1a66654a --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_indexer_cpu_drilldown.xml @@ -0,0 +1,118 @@ +
+ +
+ + + + -4h@m + now + + + + + avgCPU, memory + totalCPU, memory + duration, totalCPU + duration, avgCPU + totalAVGCPU, totalMemUsed + totalAVGCPU, totalMemUsed + + + + data.search_props.user= + + + + + + true + $value$ + + + + + + * + + + + Yes + + +
+ + + Usage Drilldown Per PID + + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* $user$ data.search_props.app=$app$ +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval read_mb = 'data.read_mb' +| eval sid='data.search_props.sid' +| eval provenance='data.search_props.provenance' | eval label=coalesce(label, provenance) +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as Started, sum(cpuperc) AS totalCPU, max(read_mb) AS read_mb, avg(cpuperc) AS avgCPUPerMinute by type, mode, app, user, label, host, data.pid, sid +| stats sum(avgCPUPerMinute) AS totalAVGCPUPerMinute, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, sum(read_mb) AS totalReadMB, sum(totalCPU) AS totalCPU by Started, type, "mode", app, user, label, host, sid, data.pid +| eval totalMemUsed = round(totalMemUsed, 2) +| eval Started=strftime(Started,"%+") +| eval duration = tostring(totalRuntime, "duration") +| eval avgCPU = round(totalAVGCPUPerMinute) +| eval totalCPU=tostring(totalCPU,"commas"), avgCPU=tostring(avgCPU,"commas") +| sort - totalRuntime, totalCPU +| fields - totalRuntime, totalAVGCPUPerMinute, sid + $time_tok.earliest$ + $time_tok.latest$ + 1 + + + + + + + + +
+
+
+ + + + Usage Drilldown Per Search Label + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* $user$ data.search_props.app=$app$ +| eval mem_used = 'data.mem_used' | eval app = 'data.search_props.app' | eval elapsed = 'data.elapsed' | eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' | eval mode = 'data.search_props.mode' | eval user = 'data.search_props.user' | eval cpuperc = 'data.pct_cpu' +| eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' | eval label=coalesce(label, provenance) +| eval search_head = if(isnull('data.search_props.search_head'),"N/A",'data.search_props.search_head') +| bin _time span=1m +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as Started, sum(cpuperc) AS totalCPU, max(read_mb) AS read_mb, avg(cpuperc) AS avgCPU by type, mode, app, user, label, data.pid, host +| stats sum(avgCPU) AS totalAVGCPU, sum(mem_used) AS totalMemUsed, sum(runtime) AS totalRuntime, sum(read_mb) AS totalReadMB, sum(totalCPU) AS totalCPU by Started, type, "mode", app, user, label +| eval totalMemUsed = round(totalMemUsed, 2) +| eval Started=strftime(Started,"%+") +| eval duration = tostring(totalRuntime, "duration") +| eval avgCPU = round(totalAVGCPU) +| eval totalCPU=tostring(totalCPU,"commas"), avgCPU=tostring(avgCPU,"commas") +| sort - $sort$ +| fields - totalRuntime, totalAVGCPU + $time_tok.earliest$ + $time_tok.latest$ + + +
+
+
+ + + + Recently Used URL By User + + index=_internal sourcetype=splunkd_ui_access user=$uservalue$ `searchheadhosts` | top referer + $time_tok.earliest$ + $time_tok.latest$ + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user.xml b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user.xml new file mode 100644 index 00000000..4c0a541a --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user.xml @@ -0,0 +1,91 @@ +
+ + This dashboard attempts to assist with finding which queries are using excessive amounts of CPU, memory, disk IOPS at the indexing tier and the queries behind them +
+ + + + -4h@m + now + + + + + Yes + No + "data.search_props.user"!=admin "data.search_props.user"!=splunk-system-user + "data.search_props.user"!=admin "data.search_props.user"!=splunk-system-user + + + + totalCPU + avgCPUPerIndexer + totalduration + averageduration + totalMemUsed + totalReadMB + count + totalCPU + + + + 60m + + + + + "" + +
+ + + Resource Usage Per User + + count is the number of searches triggered during that time period (dashboards may have multiple searches), introspection is measured in 10 second blocks (so sometimes no stats are available) + + index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* $exclusion$ +| eval mem_used = 'data.mem_used' +| eval app = 'data.search_props.app' +| eval elapsed = 'data.elapsed' +| eval label = 'data.search_props.label' +| eval type = 'data.search_props.type' +| eval mode = 'data.search_props.mode' +| eval user = 'data.search_props.user' +| eval cpuperc = 'data.pct_cpu' +| eval search_head = 'data.search_props.search_head' +| eval read_mb = 'data.read_mb' +| eval provenance='data.search_props.provenance' +| eval label=coalesce(label, provenance) +| eval sid='data.search_props.sid' +| search $filter$ +| rex field=sid "^remote_[^_]+_(?P<sid>.*)" +| eval sid = "'" . sid . "'" +| fillnull search_head value="*" +| stats max(elapsed) as runtime max(mem_used) as mem_used earliest(_time) as searchStartTime, sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU, max(read_mb) AS read_mb, values(sid) AS sids by type, mode, app, user, label, host, search_head, data.pid +| bin searchStartTime span=$timespan$ +| stats dc(sids) AS count, sum(totalCPU) AS totalCPU, sum(mem_used) AS totalMemUsed, max(runtime) AS maxRunTime, avg(runtime) AS avgRuntime, avg(avgCPU) AS avgCPUPerIndexer, sum(read_mb) AS totalReadMB, values(sids) AS sids by searchStartTime, type, mode, app, user, search_head, label +| eval maxduration = tostring(maxRunTime, "duration"), averageduration = tostring(avgRuntime, "duration") +| eval Started = strftime(searchStartTime,"%+") +| eval avgCPUPerIndexer = round(avgCPUPerIndexer) +| sort - $sort$ +| eval totalCPU=tostring(totalCPU,"commas"), avgCPUAcrossAllIndexers=tostring(avgCPUAcrossAllIndexers,"commas"), totalReadMB=tostring(totalReadMB, "commas"), totalMemUsed=tostring(totalMemUsed, "commas") +| table Started, count, user, app, label, averageduration, maxduration, totalCPU, avgCPUPerIndexer, totalReadMB, totalMemUsed, search_head, sids, mode, type + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + ["Started","count","user","app","label","averageduration","maxduration","totalCPU","avgCPUPerIndexer","totalReadMB","totalMemUsed","mode","type"] + + /app/SplunkAdmins/troubleshooting_resource_usage_per_user_drilldown?form.sid=$row.sids$&form.host=$row.search_head$&form.app=$row.app$&form.label=$row.label$&form.time.earliest=$time.earliest$&form.time.latest=$time.latest$ + +
+
+
+
diff --git a/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user_drilldown.xml b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user_drilldown.xml new file mode 100644 index 00000000..74c076ec --- /dev/null +++ b/apps/SplunkAdmins/default/data/ui/views/troubleshooting_resource_usage_per_user_drilldown.xml @@ -0,0 +1,55 @@ +
+ + Drilldown for Troubleshooting Resource Usage Per User (Splunk 6.6+ only due to the use of the IN keyword) +
+ + + + + + + + + + + + + + + + -15m + now + + +
+ + + Query information from audit logs + + + index=_audit host=$host$ "info=granted" OR "info=completed" OR "info=canceled" search_id IN ($sid$) +| rex ", search='(?P<search>[\S+\s+]+?)', " +| stats min(_time) AS time, max(_time) AS max_timestamp, values(user) AS user, values(total_run_time) AS total_run_time, values(result_count) AS result_count, values(search) AS search, values(host) AS host, values(search_et) AS startTime, values(search_lt) AS endTime, values(info) AS info, values(savedsearch_name) AS savedsearch_name by search_id +| eval app="$app$", label="$label$" +| eval endTime=if((info=="completed" OR info=="canceled") AND endTime=="N/A",max_timestamp,endTime) +| eval period=tostring(round(endTime-startTime), "duration") +| eval startTime=strftime(startTime, "%Y-%m-%d %H:%M:%S"), endTime=strftime(endTime, "%Y-%m-%d %H:%M:%S") +| fillnull value="All Time" startTime endTime period +| table time, app, user, total_run_time, result_count, period, search, label, host, startTime, endTime, info, savedsearch_name, search_id +| sort - time + $time.earliest$ + $time.latest$ + 1 + + + + + + + + + +
+
+
+
diff --git a/apps/SplunkAdmins/default/macros.conf b/apps/SplunkAdmins/default/macros.conf new file mode 100644 index 00000000..096ce5c6 --- /dev/null +++ b/apps/SplunkAdmins/default/macros.conf @@ -0,0 +1,926 @@ +############## +# +# Customise these macros to ensure the SplunkAdmins / Alerts for Splunk Admins +# application works as expected +# +############## +[indexerhosts] +definition = host=* +iseval = 0 + +[heavyforwarderhosts] +definition = host=* +iseval = 0 + +[searchheadhosts] +definition = host=* +iseval = 0 + +#Designed for searches where returning data from other search heads +#would not provide valid results... +[localsearchheadhosts] +definition = host=* +iseval = 0 + +[splunkenterprisehosts] +definition = host=* +iseval = 0 + +[deploymentserverhosts] +definition = host=* +iseval = 0 + +[licensemasterhost] +definition = host=* +iseval = 0 + +[cluster_masters] +definition = host=* +iseval = 0 + +[sysloghosts] +definition = host=* +iseval = 0 + +[searchheadsplunkservers] +definition = splunk_server=* +iseval = 0 + +[splunkindexerhostsvalue] +definition = splunk_server=* +iseval = 0 + +[splunkadmins_splunkd_source] +definition = source=*splunkd.log* +iseval = 0 + +[splunkadmins_splunkuf_source] +definition = source=*splunkd.log* +iseval = 0 + +[splunkadmins_mongo_source] +definition = source=*mongod.log* +iseval = 0 + +[splunkadmins_license_usage_source] +definition = source=*license_usage.log* +iseval = 0 + +[splunkadmins_clustermaster_oshost] +definition = host=changeme +iseval = 0 + +#Only used in a few searches, customise this if you have the cluster master as a search +#peer, if not you may wish to leave this to local and run the ClusterMasterLevel searches on +#the cluster master server... +[splunkadmins_clustermaster_host] +definition = splunk_server=local + +############## +# +# Utility functions +# +############## +[comment(1)] +args = text +definition = "" +iseval = 0 + +# +#Dynamically generate a Splunk SPL statement to filter out a list of hosts / time periods where the particular hosts +#were restarting, for example if search heads were restarting we probably don't care about delayed scheduled searches at this point in time +#Allowing a macro name to be passed in allows this function to be used for search heads or indexers or anything else +#Furthermore allowing contingency time allows some time for the server to recover from the restart if required... +#This macro returns in the form of ((host=X _time>start _timestart _time" . minTime . " _time<" .maxTime\ +| fields search\ +| format\ +| rex mode=sed field=search "s/\"//g" +iseval = 0 + +# +#Dynamically generate a Splunk SPL statement to filter out a list of keywords (hostnames without the host=) / time periods where the particular hosts +#were restarting, for example if search heads were restarting we probably don't care about delayed scheduled searches at this point in time +#Allowing a macro name to be passed in allows this function to be used for search heads or indexers or anything else +#Furthermore allowing contingency time allows some time for the server to recover from the restart if required... +#This macro returns in the form of ((X _time>start _timestart _time" . minTime . " _time<" .maxTime\ +| fields search\ +| format\ +| rex mode=sed field=search "s/\"//g" +iseval = 0 + +# +#Dynamically generate a Splunk SPL statement to filter out a list of hosts / time periods where the particular hosts +#were restarting, for example if search heads were restarting we probably don't care about delayed scheduled searches at this point in time +#Allowing a macro name to be passed in allows this function to be used for search heads or indexers or anything else +#Furthermore allowing contingency time allows some time for the server to recover from the restart if required... +#This macro returns in the form of (_time>start _timestart _time` within the audit.log files with the audit definition based on a lookup file +#note this version only substitutes the first macro seen...the Splunk 8 version can handle multiple macros at once +[splunkadmins_audit_logs_macro_sub] +definition = ```Set all values to null() in case this macro is called again within the same search. Subsitute a macro used inside a search with the definition found in the lookup file```\ +| eval definition=null(), commas=null(), commas2=null(), argCount2=null(), argCount=null(), match=null()\ +| rex field=search max_match=1 "\`(?!\")(?!')(?P[^\`]+)\`" \ +```You can have multiple macro definitions with either 0 or more arguments so we have to count them...``` \ +| rex max_match=10 field=macro "([^\"]+\")|([^']+')\s*(?P,)" \ +| rex max_match=10 field=macro "(?P,)" \ +| rex max_match=1 field=macro "(?P[^\(]+\()" \ +```Two count methods are used as if we have macro(arg1) that has no commas, but macro(arg1,arg2) will work as expected...``` \ +| eval argCount2=if(match(macro,"([^\"]+\")|([^']+')") AND isnull(commas),-1,if(isnotnull(commas2),mvcount(commas2),null())) \ +| eval argCount=if(isnull(argCount2),0,argCount2+1) \ +| eval argCount=if(argCount==0,if(isnotnull(match),1,0),argCount) \ +| rex field=macro "(?P^[^\( ]+)" \ +| eval macroName=if(argCount==0,macro,macro . "(" . argCount . ")") \ +| lookup splunkadmins_macros title AS macroName, app AS app_name, splunk_server \ +| eval app_name2="global"\ +| lookup splunkadmins_macros title AS macroName, app AS app_name2, splunk_server OUTPUTNEW definition\ +| lookup splunkadmins_macros title AS macroName, splunk_server OUTPUTNEW definition\ +| eval macroReplace=if((argCount == 0),(("`" . macro) . "`"),(("`" . macro) . "\\(.*?\\)`")), search=if(isnotnull(definition),replace(search,macroReplace,mvindex(definition,0)),search) +iseval = 0 + +#Substitute `` within the audit.log files with the audit definition based on a lookup file +#note this version only works on Splunk 8 due to the use of mvmap +[splunkadmins_audit_logs_macro_sub_v8] +definition = ```Set all values to null() in case this macro is called again within the same search. Subsitute a macro used inside a search with the definition found in the lookup file``` \ +eval definition=null(), definition2=null(), definition3=null(), commas=null(), commas2=null(), argCount2=null(), argCount=null(), match=null() \ +| rex field=search "\\`(?!\")(?!')(?P[^\\`]+)\\`" max_match=20 \ + ```remove any commas inside double quotes or single quotes inside a macro, they are probably not arguments to the macro itself``` \ +| eval remove_commas_inside_macros=mvmap(macro,replace(macro,"(\"[^\"]+\"|'[^']+')","")) \ + ```Originally a regex, the replace+len works in mvmap and determines number of commas so we can find a macro name``` \ +| eval commas2=mvmap(remove_commas_inside_macros,if(match(remove_commas_inside_macros,"^[^\(]+$"),"-1",len(replace(remove_commas_inside_macros,"[^,]+",""))+1)) \ +| rex field=macro "(?P^[^\( ]+)" max_match=20 \ +| eval macro_commas=mvzip(macro_name,commas2,"!!!!!!!") \ + ```A macro with zero arguments is -1 from the previous mvmap, if it has non-zero arguments the definition changes to macro(number)...``` \ +| eval macroName=mvmap(macro_commas,if(mvindex(split(macro_commas,"!!!!!!!"),1)=="-1",mvindex(split(macro_commas,"!!!!!!!"),0),mvindex(split(macro_commas,"!!!!!!!"),0) . "(" . mvindex(split(macro_commas,"!!!!!!!"),1) . ")")) \ +| lookup splunkadmins_macros title AS macroName, app AS app_name, splunk_server \ +| eval app_name2="global" \ + ```The original version just did an OUTPUTNEW definition, however this has the limitation that if 1 of the 5 macros found resolves, output stops. And this can result in missing macros. So this version over-matches but that appears to be the tradeoff...without making this even more complicated``` \ +| lookup splunkadmins_macros title AS macroName, app AS app_name2, splunk_server OUTPUT definition AS definition2 \ +| lookup splunkadmins_macros title AS macroName, splunk_server OUTPUT definition AS definition3 \ +| eval definition=mvdedup(mvappend(definition,definition2,definition3)) \ +| fillnull definition value="macronotfound" \ +| nomv definition \ +| eval definition=" " . definition . " " \ + ```While an mvmap could replace per-macro that results in a multivalue output. Also replace doesn't handle a multivalued replacement argument so just replace the first macro if it exists with the definitions of all the macros, close enough for what we want``` \ +| eval search=if(isnotnull(macro_name),replace(search,mvindex(macro_name,0),definition),search) +iseval = 0 + +#Substitute `` within the any file +[splunkadmins_macro_sub(1)] +args = fieldname +definition = ```Set all values to null() in case this macro is called again within the same search. Subsitute a macro used inside a search with the definition found in the lookup file``` \ +eval definition=null(), definition2=null(), definition3=null(), commas=null(), commas2=null(), argCount2=null(), argCount=null(), match=null() \ +| rex field=$fieldname$ "\\`(?!\")(?!')(?P[^\\`]+)\\`" max_match=20 \ + ```remove any commas inside double quotes or single quotes inside a macro, they are probably not arguments to the macro itself``` \ +| eval remove_commas_inside_macros=mvmap(macro,replace(macro,"(\"[^\"]+\"|'[^']+')","")) \ + ```Originally a regex, the replace+len works in mvmap and determines number of commas so we can find a macro name``` \ +| eval commas2=mvmap(remove_commas_inside_macros,if(match(remove_commas_inside_macros,"^[^\(]+$"),"-1",len(replace(remove_commas_inside_macros,"[^,]+",""))+1)) \ +| rex field=macro "(?P^[^\( ]+)" max_match=20 \ +| eval macro_commas=mvzip(macro_name,commas2,"!!!!!!!") \ + ```A macro with zero arguments is -1 from the previous mvmap, if it has non-zero arguments the definition changes to macro(number)...``` \ +| eval macroName=mvmap(macro_commas,if(mvindex(split(macro_commas,"!!!!!!!"),1)=="-1",mvindex(split(macro_commas,"!!!!!!!"),0),mvindex(split(macro_commas,"!!!!!!!"),0) . "(" . mvindex(split(macro_commas,"!!!!!!!"),1) . ")")) \ +| lookup splunkadmins_macros title AS macroName, app AS app_name, splunk_server \ +| eval app_name2="global" \ + ```The original version just did an OUTPUTNEW definition, however this has the limitation that if 1 of the 5 macros found resolves, output stops. And this can result in missing macros. So this version over-matches but that appears to be the tradeoff...without making this even more complicated``` \ +| lookup splunkadmins_macros title AS macroName, app AS app_name2, splunk_server OUTPUT definition AS definition2 \ +| lookup splunkadmins_macros title AS macroName, splunk_server OUTPUT definition AS definition3 \ +| eval definition=mvdedup(mvappend(definition,definition2,definition3)) \ +| fillnull definition value="macronotfound" \ +| nomv definition \ +| eval definition=" " . definition . " " \ + ```While an mvmap could replace per-macro that results in a multivalue output. Also replace doesn't handle a multivalued replacement argument so just replace the first macro if it exists with the definitions of all the macros, close enough for what we want``` \ +| eval search=if(isnotnull(macro_name),replace($fieldname$,mvindex(macro_name,0),definition),$fieldname$) +iseval = 0 + +#Note this macro requires TA-webtools +#Alternatively the "Mothership app" on SplunkBase can be used for this purpose... +[splunkadmins_remote_macros(3)] +args = url,user,pass +definition = | curl method=get uri="$url$/servicesNS/-/-/configs/conf-macros?count=-1&output_mode=json" user=$user$ pass=$pass$\ +| spath input=curl_message path="entry{}.name" output=title\ +| spath input=curl_message path="entry{}.acl.app" output=app\ +| spath input=curl_message path="entry{}.content.definition" output=definition\ +| spath input=curl_message path="entry{}.acl.sharing" output=sharing\ +| fields - curl_* \ +| fields title, app, definition, sharing \ +| eval data=mvzip(mvzip(mvzip(title, 'app', "%%%%"),definition,"%%%%"),sharing,"%%%%")\ +| fields data \ +| mvexpand data \ +| makemv data delim="%%%%" \ +| eval title=mvindex(data,0),app=mvindex(data,1), definition=mvindex(data,2), sharing=mvindex(data,3)\ +| search sharing!=user\ +| fields - data +iseval = 0 + +#Not currently in use by searches but attempts to pull the roles from a remote Splunk server +#Alternatively the "Mothership app" on SplunkBase can be used for this purpose... +[splunkadmins_remote_roles(3)] +args = url,user,pass +definition = | curl method=get uri="$url$/services/authentication/users?output_mode=json&count=0&f=roles" user="$user$" pass="$pass$"\ +| rex field=curl_message max_match=10000 "{\"name\":\"(?P[^\"]+)\".*?\"roles\":\[(?P[^\]]+)" \ +| fields - curl_* \ +| eval data=mvzip(user,roles,"%%%%") \ +| mvexpand data \ +| table data \ +| makemv data delim="%%%%" \ +| eval user=mvindex(data,0), roles=mvindex(data,1)\ +| fields - data\ +| eval roles=replace(roles,"\"","")\ +| makemv roles delim="," +iseval = 0 + +#Macro to determine search head cluster name, potentially using a case statement or similar +[search_head_cluster] +definition = "default" +iseval = 0 + +#Macro to determine which indexer cluster name, potentially using a case statement or similar +[indexer_cluster_name(1)] +args = indexer +definition = "default" +iseval = 0 + +#Macro to define indexer cluster name +[indexer_cluster_name] +definition = "default" +iseval = 0 + +[forwarder_name(1)] +args = hostname +definition = "default" +iseval = 0 + +[search_type_from_sid(1)] +args = search_id +definition = eval from=null(), username=null(), searchname2=null(), searchname=null()\ +| rex field=$search_id$ "'?(_rt)?(_?subsearch)*_?(?P[^_]+)((_(?P[^_]+))|(__(?P[^_]+)))((__(?P[^_]+)__(?P[^_]+))|(_(?P[^_]+)__(?P[^_]+)))"\ +| rex field=$search_id$ "^_?(?PSummaryDirector)"\ + ```Pattern appears to vary but remote__ is consistent along with the optional _subsearch, the _from can be __ownername__appname__RMD for dashboards as one pattern, it can also be unixepoch (ad-hoc), or scheduler__username__appname (scheduled search), or username__owner__(something)__dashboardview, among others. RMD values can be translated via audit.log, scheduler.log or remote_searches.log (if savedsearch_name is there)!```\ +| fillnull from value="adhoc"\ +| eval searchname=coalesce(searchname,searchname2)\ +| eval type=case(from=="scheduler","scheduled",from=="SummaryDirector","acceleration",match(search_id,"^'?alertsmanager_"),"scheduled",isnotnull(searchname),"dashboard",1=1,"ad-hoc") +iseval = 0 + +[base64decode(1)] +args = afield +definition = eval $afield$=null() ```As per https://docs.splunk.com/Documentation/Splunk/latest/Report/Createandeditreports usernames/apps can be base64 encrypted, remove the eval when ready to use this...decrypt2 (splunkbase) can be used to decrypt with (remove the backslashes): eval $afield$=$afield$ . "===" | decrypt field=$afield$ atob emit('$afield$')``` +iseval = 0 + +[dashboard_depends_filter1] +definition = "" +iseval = 0 + +[dashboard_depends_filter2] +definition = ```potentially a where clause to only filter when a certain number of tokens exist...``` "" +iseval = 0 + +[dashboard_depends_filter3] +definition = ```potentially a where clause to only filter when a certain number of tokens were matched or similar...``` "" +iseval = 0 + +[splunkadmins_wineventlog_index] +definition = wineventlog +iseval = 0 + +[splunkadmins_unexpected_term_count] +definition = 5 +iseval = 0 + +#Note getsize=true appears to be added in 7.3.3+ and above so this will only work on newer versions and only for lookup definitions +#the /admin/file-explorer/ will work for all CSV files but is admin only so using this option as a macro... +[mylookups] +definition = rest splunk_server=local /servicesNS/-/-/admin/transforms-lookup getsize=true \ +| search [| rest /services/authentication/current-context/context splunk_server=local | head 1 | fields username | rename username AS eai:acl.owner] \ +| eval name = 'eai:acl.app' + "." + title \ +| rename "eai:acl.sharing" AS sharing \ +| table name type size sharing \ +| sort - size + +[splunkadmins_tailreader_ignorepath] +definition = "" +iseval = 0 + +[splunkadmins_splunk_server_name] +definition = "default" + +[splunkadmins_audit_alltime] +definition = "" + +[splunkadmins_dashboards_alltime] +definition = "" + +#Just a nicer way to format the returned data from the conf-props or conf-similar (borrowed from slack) +[conf_rest_endpoint(1)] +args = endpoint +definition = rest /services/configs/conf-$endpoint$ splunk_server=local \ +| eval _raw="", acl="" \ +| foreach "*" \ + [| eval field=if(match("<>","^(title|eai:|splunk_server|author|id|updated|published)"),"","<> = ".'<>') \ + | eval acl_field=if(match("<>","^(eai:|author|updated|published)"),"<> = ".'<>',"") \ + | eval _raw=mvappend(_raw,field) \ + | eval acl=mvappend(acl,acl_field)] \ +| fields splunk_server title _raw acl \ +| eval _raw=mvappend("[".title."]",_raw) +iseval = 0 + +[splunkadmins_excessive_rest_api_httplib] +definition = "Python-httplib2/0.13.1 (gzip)" + +[splunkadmins_excessive_rest_api_threshold] +definition = 100 + +#Convert a time string into epoch time +[splunkadmins_epoch(1)] +args = time +definition = strptime("$time$","%Y-%m-%d %T") +iseval = 1 + +[splunkadmins_audit_logs_datamodel_sub] +definition = eval definition=null(), datamodel3=null(), datamodel1=null(), datamodel2=null()\ +| rex field=search "^\s*\|\s*((from\s+datamodel\s*:?\s*\"?(?P[^\"\.\s]+))|(datamodel\s+\"?(?P[^\s\"\.]+)\"?\s+[^\|]*search))" \ +| rex field=search "datamodel\s*=\s*\"?(?P[^\s\"\.]+)" \ +| eval datamodel_res=case(isnotnull(datamodel3) AND match(search,"\s*\|\s*(tstats)"),datamodel3,isnotnull(datamodel1),datamodel1,isnotnull(datamodel2),datamodel2,true(),null()) \ +| lookup splunkadmins_datamodels datamodel AS datamodel_res, app AS app_name, splunk_server OUTPUT definition\ +| eval app_name2="global"\ +| lookup splunkadmins_datamodels datamodel AS datamodel_res, app AS app_name2, splunk_server OUTPUTNEW definition\ +| lookup splunkadmins_datamodels datamodel AS datamodel_res, splunk_server OUTPUTNEW definition\ +| nomv definition \ +| eval definition=" " . definition . " "\ + ```While an mvmap could replace per-datamodel that results in a multivalue output. Also replace doesn't handle a multivalued replacement argument so just replace the first macro if it exists with the definitions of all the datamodels``` \ +| eval search=if(isnotnull(datamodel_res),replace(search,mvindex(datamodel_res,0),definition),search) +iseval = 0 + +[splunkadmins_audit_logs_tags_sub] +definition = eval pretag=null(), tag=null(), definition=null(), definition2=null(), definition3=null() \ +| rex field=search max_match=50 "(?Ptag\s*=\s*)(?P[^\s\)\"]+)" \ +| lookup splunkadmins_tags tag, app AS app_name, splunk_server OUTPUT definition \ +| eval app_name2="global" \ +| lookup splunkadmins_tags tag, app AS app_name2, splunk_server OUTPUT definition AS definition2 \ +| lookup splunkadmins_tags tag, splunk_server OUTPUT definition AS definition3 \ +| eval definition=mvdedup(mvappend(definition, definition2, definition3)) \ +| nomv definition \ +| eval search=if(isnotnull(definition),replace(search,pre_tag . tag," " . definition . " "),search) +iseval = 0 + +[splunkadmins_audit_logs_eventtypes_sub] +definition = eval pre_eventtype=null(), eventtype=null(), eventtype2=null(), definition=null(), definition2=null(), definition3=null() \ +| rex field=search max_match=20 "(?Peventtype\s*=\s*)((\"(?P[^\"]+))|((?P[^\s\)]+)))" \ +| eval eventtype=coalesce(eventtype,eventtype2) \ +| lookup splunkadmins_eventtypes eventtype, app AS app_name, splunk_server OUTPUT definition \ +| eval app_name2="global" \ +| lookup splunkadmins_eventtypes eventtype, app AS app_name2, splunk_server OUTPUT definition AS definition2 \ +| lookup splunkadmins_eventtypes eventtype, splunk_server OUTPUT definition AS definition3 \ +| eval definition=mvdedup(mvappend(definition, definition2, definition3)) \ +| nomv definition \ +| eval search=if(isnotnull(definition),replace(search,pre_eventtype . "\"?" . eventtype," " . definition . " "),search) +iseval = 0 + +[splunkadmins_slowpeer_time] +definition = 60 +iseval = 0 + +[splunkadmins_slowpeer_threshold] +definition = 10 +iseval = 0 + +[splunkadmins_searchmessages_user_1] +definition = "" +iseval = 0 + +[splunkadmins_searchmessages_user_2] +definition = "" +iseval = 0 + +[splunkadmins_searchmessages_admin_1] +definition = "" +iseval = 0 + +[splunkadmins_searchmessages_admin_2] +definition = "" +iseval = 0 + +[splunkadmins_splunkd_log_messages] +definition = "" +iseval = 0 + +[splunkadmins_alertactions_max_action_results] +definition = "" +iseval = 0 + +[splunkadmins_authorize_conf_prevent_users] +definition = role!="can_delete" +iseval = 0 + +[splunkadmins_indexer_remotesearches_alltime] +definition = host=localhost +iseval = 0 + +[splunkadmins_dataparsing_error] +definition = "" +iseval = 0 + +[splunkadmins_shutdown_time_by_shc(3)] +args = macroName, minTimeContingency, maxTimeContingency +definition = search ```Send an exclusion list in terms of a search result for the time when any SH was shutdown```\ +index=_internal (`$macroName$`) sourcetype=splunkd `splunkadmins_splunkd_source` (CASE("Shutting down")) OR "Shutdown complete in" OR "Received shutdown signal." OR "Shutdown signal received" OR "master has instructed peer to restart" OR "Performing early shutdown tasks"\ +| eval message=coalesce(message,event_message)\ +| stats min(_time) AS logTime by message, host\ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +| stats min(logTime) AS minTime, max(logTime) AS maxTime by search_head_cluster\ +| eval minTime=minTime - $minTimeContingency$, maxTime=maxTime + $maxTimeContingency$\ +| eval search=" _time>" . minTime . " _time<" .maxTime . " search_head_cluster=" . search_head_cluster\ +| fields search\ +| format\ +| rex mode=sed field=search "s/\"//g" +iseval = 0 + +[splunkadmins_indexerqueue_count] +definition = 1 +iseval = 0 + +[splunkadmins_deploymentserver_splunkserver] +definition = splunk_server=local +iseval = 0 + +[splunkadmins_sh_knowledgebundle_metrics_filter] +definition = where replication_time_msec>200000 +iseval = 0 + +[splunkadmins_sh_knowledgebundle_metrics_timespan] +definition = 60m +iseval = 0 + +[splunkadmins_bundlepush_span] +definition = 10m +iseval = 0 + +[splunkadmins_metrics_source] +definition = source=*metrics.log* +iseval = 0 + +[splunkadmins_hec_metrics_source] +definition = source=*http_event_collector_metrics.log* +iseval = 0 + +[splunkadmins_summaryindex_durablesearch] +definition = NOT title IN ("SearchHeadLevel - summary indexing searches not using durable search") next_scheduled_time!="" +iseval = 0 + +[splunkadmins_events_per_second] +definition = desc.savedsearch_name IN ("Example") +iseval = 0 diff --git a/apps/SplunkAdmins/default/props.conf b/apps/SplunkAdmins/default/props.conf new file mode 100644 index 00000000..6d7da6c4 --- /dev/null +++ b/apps/SplunkAdmins/default/props.conf @@ -0,0 +1,39 @@ +#Splunk does not index the search.log files from the dispatch directory by default +#so create a stanza to take only the parts we care about... +#Example lines to look for include +#05-24-2018 08:31:03.881 ERROR SearchResultTransaction - Got status 502 from https://x.x.x.x:8089/services/streams/search?sh_sid=1527150641.164891_315974D3-2FA6-4A16-839A-A95A0376BA14 +#05-24-2018 08:31:03.881 ERROR SearchResultTransaction - HTTP error status message from https://x.x.x.x:8089/services/streams/search?sh_sid=1527150641.164891_315974D3-2FA6-4A16-839A-A95A0376BA14: Error connecting: Connect Timeout +#05-24-2018 08:31:03.881 ERROR DispatchThread - sid:1527150641.164891_315974D3-2FA6-4A16-839A-A95A0376BA14 Unknown error for peer indexername. Search Results might be incomplete. If this occurs frequently, please check on the peer. +#05-28-2018 00:52:17.245 INFO DispatchThread - sid:1527468707.34320_315974D3-DFFC-48EC-86C8-33BD6744EE4F Search auto-finalized after time limit (30 seconds) reached. +#however a better alternative may be [search] +#log_search_messages = true +#In the limits.conf file and then use the search_messages.log file... +[splunk:searchlog] +TIME_PREFIX = ^ +TIME_FORMAT = %m-%d-%Y %H:%M:%S.%3N +SHOULD_LINEMERGE = false +TRANSFORMS-set = setNull,setError,setAutoFinalize + +#Example inputs.conf if you want to use the above in Linux +#[monitor:///opt/splunk/var/run/splunk/dispatch/*/search.log] +#sourcetype = splunk:searchlog +#index = _internal + +#Splunk records failures from search heads to indexer for corrupt buckets in the info.csv log only on the search head level +#the search.log on the indexer peers *will* record this so if your ingesting the search.log from the peers you probably don't need this one... +#The info.csv does show you what the end user will see in terms of errors such as this... +#Examples include: +#,,,,,,,,,,,,,,,,,ERROR,"[hostname] Failed to read size=1 event(s) from rawdata in bucket='_internal~43~E21ADB4E-02B7-4877-8A42-A15CE7F422BD' path='.../db_1515304396_1515080916_.... Rawdata may be corrupt, see search.log. Results may be incomplete!","{}",,,,,,, +#Note that a better alternative may be [search] +#log_search_messages = true +#In the limits.conf file and then use the search_messages.log file... +[splunk:search:info] +SHOULD_LINEMERGE = false +DATETIME_CONFIG = NONE +TRANSFORMS-set = setNull,setWARNorERROR,setAutoFinalize + +#Example inputs.conf if you want to use the above in Linux +#[monitor:///opt/splunk/var/run/splunk/dispatch/*/info.csv] +#sourcetype = splunk:search:info +#index = _internal +#crcSalt = diff --git a/apps/SplunkAdmins/default/savedsearches.conf b/apps/SplunkAdmins/default/savedsearches.conf new file mode 100644 index 00000000..2847d5e8 --- /dev/null +++ b/apps/SplunkAdmins/default/savedsearches.conf @@ -0,0 +1,8773 @@ +[SearchHeadLevel - Accelerated DataModels with All Time Searching Enabled] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 0 7 * * * +description = Chance the alert requires action? High. Having an accelerated data model running searches every 5 minutes over all time can cause serious issues. Search Head specific? Yes +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/models `splunkadmins_restmacro` search=acceleration=1 search=acceleration.earliest_time=0 search=disabled=0 f=eai:data f=eai:acl* \ +| fields eai:acl.app, eai:acl.owner, eai:acl.sharing, title, updated \ +| rename eai:acl.app AS app, eai:acl.owner AS owner, eai:acl.sharing AS sharing +disabled = 1 + +[AllSplunkEnterpriseLevel - Email Sending Failures] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 3 * * * * +description = Chance the alert requires action? High. Ideally this action shouldn't be using email but this should fire when the email server is throwing errors +dispatch.earliest_time = -1h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find any failures to send emails due to either the size of the email or the email server not working or similar```\ +index=_internal `splunkenterprisehosts` "stderr from " python* sendemail.py sourcetype=splunkd (`splunkadmins_splunkd_source`)\ +| eval message=coalesce(message,event_message)\ +| dedup message \ +| rex "ssname=(?P[^\"]+)"\ +| rex "stderr from '[^']+':\s+(?P.*)"\ +| rex field=results_link "/app/(?P[^/]+)" \ +| rex field=results_file ".*/dispatch/[^_]+__(?P[^_]+)"\ +| fillnull value="N/A" app \ +| eval time=strftime(_time, "%+")\ +| stats count, values(time) AS time by error, savedsearch, user, app\ +| table time, count, error, savedsearch, user, app +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunk Servers throwing runScript errors] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 10 * * * +description = Chance the alert requires action? Low. Splunk Enterprise servers are throwing an error related to running a script, this may or may not be an issue... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```runScript errors are an indicator of a potential issue with an application```\ +index=_internal `splunkenterprisehosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) \ +("ERROR ScriptRunner" "stderr from '*python* *runScript.py execute'") OR ("ERROR ExecProcessor" "message from \"python* *ERROR*")\ +```Do not include INFO level messages from standard error/out```\ +NOT " INFO " `splunkadmins_runscript` \ +| cluster showcount=true \ +| fields host, _raw, cluster_count +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunkd Crash Logs Have Appeared in Production] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 47 3 * * 1 +description = Chance the alert requires action? High. Production crashes are usually a problem +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","title","severity"] +display.general.type = statistics +display.page.search.mode = fast +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```crash logs on the Splunk enterprise servers are usually an issue, this may require a support ticket```\ +index=_internal `splunkenterprisehosts` sourcetype=splunkd_crash_log\ +| top source, host, sourcetype +disabled = 1 + +[AllSplunkEnterpriseLevel - ulimit on Splunk enterprise servers is below 8192] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 17 3 * * * +description = Chance the alert requires action? High. ulimit should be 64000 or above as per the http://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements. Note this is replaced by MonitoringConsole - Check OS ulimits via REST if you want to use REST instead... +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Any Splunk enterprise servers running less than 64000 file descriptors can result in a crash, therefore we watch the ulimit numbers on startup``` \ +```You could do | rest /services/server/sysinfo | table ulimit* or similar but that will not cover all Splunk enterprise servers that are in the _internal index...```\ +index=_internal ("ulimit" "open files:") OR ("fd limit" "lower") ( `splunkenterprisehosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) ) \ +| rex "(?P\d+) files" \ +| where nooffiles<64000 OR searchmatch("fd limit")\ +| fields _time, _raw, host +disabled = 1 + +[AllSplunkLevel - Application Installation Failures From Deployment Manager] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 0 9 * * * +description = Chance the alert requires action? Moderate. Applications have failed to install from the deployment server and this may require investigation +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Deployment clients can pull applications down but they may not install the application so we watch for this error to see if an application failed to install``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) action=Install OR action=download log_level=WARN OR log_level=ERROR\ +| cluster t=0.9 showcount=true \ +| where cluster_count>1 \ +| lookup dnslookup clientip as ip \ +| eval message=coalesce(message, event_message) \ +| table cluster_count, clienthost, app, ip, component, message +disabled = 1 + +[AllSplunkLevel - Time skew on Splunk Servers] +action.keyindicator.invert = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +alert.suppress = 1 +alert.suppress.period = 3d +counttype = number of events +cron_schedule = 53 2,6,10,14,18,22 * * * +description = Chance the alert requires action? Moderate. A time skew should not exist, if we see this alert then something is not working in NTP... +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A time skew issue likely shows an issue on the endpoint forwarder rather than a Splunk server but it is useful to watch for```\ +index=_internal "A time skew of approximately" (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) sourcetype=splunkd `splunkadmins_timeskew`\ +| rex "Peer:https?://(?P[^:]+).*A time skew of approximately (?P-?\d+)"\ +| eval negativeSeconds=if(substr(seconds,0,1)="-","true", "false"), seconds=abs(seconds)\ +| stats values(negativeSeconds) AS negativeSeconds, first(_raw) AS _raw, values(host) AS reportingHost, max(_time) AS lastSeen, min(_time) AS firstSeen, avg(seconds) AS avgSkew, max(seconds) AS maxSkew by hostname \ +| eval avgSkew=if(negativeSeconds="true","-" . avgSkew,avgSkew), maxSkew=if(negativeSeconds="true","-" . maxSkew,maxSkew) \ +| eval lastSeen = strftime(lastSeen, "%+"), firstSeen = strftime(firstSeen, "%+") \ +| table hostname, reportingHost, _raw, lastSeen, firstSeen, avgSkew, maxSkew +disabled = 1 + +[DeploymentServer - Application Not Found On Deployment Server] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 14 * * * * +description = Chance the alert requires action? High. The application was not found on the deployment server or another deployment server error occurred +dispatch.earliest_time = -1h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This usually indicates a misconfigured serverclass.conf or a missing application from the deployment-apps directory``` \ +index=_internal `deploymentserverhosts` "ERROR Serverclass" "Failed to load app." sourcetype=splunkd (`splunkadmins_splunkd_source`) \ +| bin _time span=20m \ +| top Application, path, _time +disabled = 1 + +[DeploymentServer - Forwarder has changed properties on phone home] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 37 6 * * 3 +description = Chance the alert requires action? Moderate. Only detect when a forwarder has switched IP's or something strange has happened, ignore multiple DNS names for the same IP +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This looks for unusual changes on the phone home to the deployment server, this alert can also be completely harmless``` \ +index=_internal `deploymentserverhosts` "has changed some of its properties on the latest phone home.Old properties are" (`splunkadmins_splunkd_source`) sourcetype=splunkd `splunkadmins_changedprops`\ +| rex "Client with Id '(?P[^']+)" \ +| sort clientid \ +| eventstats count by clientid \ +| where count>`splunkadmins_changedprops_count` \ +| stats values(ip) AS "IP List", values(dns) AS "DNS names", values(hostname) AS "Hostname List", values(uts) AS uts by name \ +| eval numberOfIPs=mvcount("IP List"), numberOfHostnames=mvcount("Hostname List") \ +| search ```Having multiple DNS names for an IP address is almost normal here, however multiple IPs or hostnames might be a real issue. Ignoring multiple DNS names only``` numberOfIPs>1 OR numberOfHostnames>1 +disabled = 1 + +[DeploymentServer - btool validation failures occurring on deployment server] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 28 11 * * * +description = Chance the alert requires action? Moderate. Email about any btool validation errors on the deployment server +dispatch.earliest_time = -1d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This alert detects when the deployment server is throwing some kind of warning about an application it is going to deploy. The exclusion list includes lines that are not really relevant as they appear later in the log entries``` \ +index=_internal `deploymentserverhosts` "WARN Application" sourcetype=splunkd (`splunkadmins_splunkd_source`)\ + NOT "There were the following errors in btool check:" \ +`splunkadmins_btoolvalidation_ds` \ +| eval message=coalesce(message,event_message)\ +| dedup message | fields _time _raw host +disabled = 1 + +[ForwarderLevel - Bandwidth Throttling Occurring] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.suppress.period = 12h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 37 02 * * * +description = Chance the alert requires action? High. Cases where the Splunk forwarder is delayed from sending the data to Splunk due to the maxKbps limit +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This alert detects universal or heavy forwarders that have hit the maxKbps setting in the limits.conf and might need to be investigated``` \ +index=_internal "has reached maxKBps. As a result, data forwarding may be throttled" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) `splunkadmins_bandwidth`\ +| bin _time span=1h\ +| stats count as countPerHost by host, _time\ +| where countPerHost > 1 +disabled = 1 + +[ForwarderLevel - File Too Small to checkCRC occurring multiple times] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 12h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0,15,30,45 * * * * +description = Chance the alert requires action? Low. CRC checksum errors multiple times in may indicate a problem with the crc checksum on the particular file, it's also possible we are seeing a zero sized file or a rolled file... +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```An experimental alert to detect the seekcrc too small errors in the splunkd.log file occurring a bit too regularly``` \ +index=_internal "File too small to check seekcrc, probably truncated" \ +sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) `splunkadmins_toosmall_checkcrc`\ +```Older universal forwarders have a variety of logs that will never be more than zero sized, therefore this error is legitimate for them```\ +NOT (file="'/*/splunkforwarder/var/log/splunk/license_usage.log'" OR file="'/*/splunkforwarder/var/log/splunk/license_usage_summary.log'" OR file="'/*/splunkforwarder/var/log/splunk/mongod.log'" OR file="'/*/splunkforwarder/var/log/splunk/remote_searches.log'" OR file="'/*/splunkforwarder/var/log/splunk/scheduler.log'" OR file="'/*/splunkforwarder/var/log/splunk/searchhistory.log'" OR file="'/*/splunkforwarder/var/log/splunk/splunkd_ui_access.log'" OR file="'/*/splunkforwarder/var/log/splunk/crash-*'" OR file="'/*/splunkforwarder/var/log/splunk/btool.log'" OR file="'/*/splunkforwarder/var/log/splunk/license_audit.log'")\ +```Older windows based universal forwarders can also have these same zero sized log files, therefore this error is legitimate for them```\ +NOT (file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\license_usage.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\license_usage_summary.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\mongod.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\remote_searches.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\scheduler.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\searchhistory.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\splunkd_ui_access.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\crash-*'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\btool.log'" OR file="'\\*\\SplunkUniversalForwarder\\var\\log\\splunk\\license_audit.log'")\ +```Splunk enterprise instances running on non-official hostnames```\ +NOT (file="'/opt/splunk/var/log/splunk/license_usage.log'" OR file="'/opt/splunk/var/log/splunk/license_usage_summary.log'" OR file="'/opt/splunk/var/log/splunk/mongod.log'" OR file="'/opt/splunk/var/log/splunk/remote_searches.log'" OR file="'/opt/splunk/var/log/splunk/scheduler.log'" OR file="'/opt/splunk/var/log/splunk/searchhistory.log'" OR file="'/opt/splunk/var/log/splunk/splunkd_ui_access.log'" OR file="'/opt/splunk/var/log/splunk/crash-*'" OR file="'/opt/splunk/var/log/splunk/btool.log'" OR file="'/opt/splunk/var/log/splunk/license_audit.log'")\ +```Regex for filename now replaces the default field extraction due to Windows based filenames containing spaces..```\ +| rex "file=(?P.+)\)\."\ +| stats sum(linecount) as numberOfEntries by host, file\ +| where numberOfEntries > 10 +disabled = 1 + +[ForwarderLevel - Forwarders in restart loop] +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 60m +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0,15,30,45 * * * * +description = Chance the alert requires action? Moderate. Attempt to detect universal forwarders that are restarting too often +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If a forwarder restarts more than 5 times in 15 minutes there might be a problematic script that is restarting it too often``` \ +index=_internal "Received shutdown signal." sourcetype=splunkd (`splunkadmins_splunkuf_source`)\ +| stats count as restartCount by host \ +| where restartCount > 5 +disabled = 1 + +[ForwarderLevel - SSL Errors In Logs (Potential Universal Forwarder and License Issue)] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 53 22 * * * +description = Chance the alert requires action? Moderate. SSL errors from Windows forwarder sin the past have resulted in duplication and excessive license usage, this alert exists to detect this scenario. +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Excessive SSL errors may relate to a bug in the universal forwarder, if the SSL errors relate to duplication this could cause a license usage issue``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkuf_source`) NOT (`splunkenterprisehosts`)\ +"sock_error = 10054. SSL Error = error:00000000:lib(0):func(0):reason(0)"\ +| top limit=500 host +disabled = 1 + +[ForwarderLevel - Splunk Forwarder Down] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 * * * * +description = Chance the alert requires action? Low. Splunk Forwarders Down (excluding timeshift servers and AWS cloud forwarders) +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.mode = verbose +display.page.search.tab = statistics +display.statistics.drilldown = row +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | metadata type=hosts index=_internal \ +| search ```Find forwarders that have recently stopped talking to the indexers / appear to be down```\ +NOT (`splunkenterprisehosts`) `splunkadmins_forwarderdown`\ +| eval age=now()-recentTime | eval status=if(age<1200,"UP","DOWN") \ +| eval "Last Active On"=strftime(recentTime, "%+") \ +| rename age as Age \ +| eval Hour=round(Age/3600,0)\ +| eval Minute=round((Age%3600)/60,0)\ +| eval Age="-".Hour."h"." : ".Minute."m" \ +| table host, status, "Last Active On", Age \ +| search status=DOWN \ +| lookup dnslookup clienthost AS host +disabled = 1 + +[ForwarderLevel - Splunk HTTP Listener Overwhelmed] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 53 3 * * * +description = Chance the alert requires action? High. HTTP listeners should not be overwhelmed with incoming connections, the thread/socket limits may have been reached +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find if the HTTP listener cannot cope with the incoming load of data. Refer to https://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/HTTPthreadlimitissues for more information``` \ +index=_internal HttpListener "Can't handle request for" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkenterprisehosts` +disabled = 1 + +[ForwarderLevel - Splunk Heavy logging sources] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 4,34 * * * * +description = Chance the alert requires action? Low. Sources that are sending a large amount of log data... +dispatch.earliest_time = -30m@m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find splunk sources sending excessive amounts of logs in and then conditionally email the right team members``` \ +index=_internal `splunkadmins_license_usage_source` type=Usage `licensemasterhost` sourcetype=splunkd `splunkadmins_heavylogging`\ +| stats sum(b) as totalBytes by s, h, idx, st \ +| eval totalMBInPast30Mins=round(totalBytes/1024/1024) \ +| where totalMBInPast30Mins>500\ +| table s, h, idx, st, totalMBInPast30Mins +disabled = 1 + +[ForwarderLevel - Splunk Universal Forwarders Exceeding the File Descriptor Cache] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 0 11 * * * +description = Chance the alert requires action? Low. These forwarders may need an increase in their file descriptor cache limits +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","title","severity"] +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The file descriptor cache full message is a potential indicator that we are monitoring directories with many files and this might cause the forwarder to utilize extra CPU```\ +index=_internal TailReader "File descriptor cache is full" (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) sourcetype=splunkd `splunkadmins_exceeding_filedescriptor`\ +| eval message=coalesce(message,event_message)\ +| stats values(message), count by host +disabled = 1 + +[ForwarderLevel - Splunk forwarders are having issues with sending data to indexers] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 28 * * * * +description = Chance the alert requires action? Low. A level of these alerts just mean the indexer is busy / not receiving data fast enough, many alerts indicate the indexer is having serious issues. +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A could not send data to output queue from the indexers or heavy forwarders often indicates a performance issue``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) "Could not send data to output queue" (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_sending_data`\ +| search ```Exclude shutdown times``` NOT [`splunkadmins_shutdown_time(indexerhosts,0,0)`]\ +| bin _time span=20m\ +| stats count by host, _time\ +| search (count>`splunkadmins_sending_data_nonhf_count` NOT `heavyforwarderhosts`) OR (count>`splunkadmins_sending_data_hf_count` `heavyforwarderhosts`) +disabled = 1 + +[ForwarderLevel - Splunk forwarders failing due to disk space issues] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 45 * * * * +description = Chance the alert requires action? High. A universal forwarder has run out of disk space +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect universal forwarders that do not have any disk space left and therefore cannot work as expected``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) "No space left on device" \ +| top host +disabled = 1 + +[ForwarderLevel - Splunk universal forwarders with ulimit issues] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 10 * * 1 +description = Chance the alert requires action? High. Universal forwarder with ulimit issues +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 50 +search = ```Detect universal forwarders that have the ulimit set too low for the number of file descriptors (ulimit -n)``` \ +index=_internal log_level=WARN sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) (component=ulimit "Splunk may not work due to low file size limit") OR ("fd limit" "lower") \ +| dedup host \ +| fields host _raw +disabled = 1 + +[ForwarderLevel - Unusual number of duplication alerts] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 07 22 * * * +description = Chance the alert requires action? Low. An unusual number of duplication alerts has appeared from these universal forwarders +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The number of warnings about duplication seems unusually high and may require investigation \ +The duplication warnings will occur with indexer acknowledgement enabled and indexer shutdowns. Other circumstances likely require some kind of investigation, the issue may also appear if the forwarder is having trouble getting CPU time...```\ +index=_internal sourcetype=splunkd (`splunkadmins_splunkuf_source`) NOT (`splunkenterprisehosts`) "duplication" `splunkadmins_unusual_duplication`\ +| search ```Exclude shutdown times``` NOT [`splunkadmins_shutdown_time(indexerhosts,60,60)`]\ +| stats count by host \ +| where count > `splunkadmins_unusual_duplication_count` +disabled = 1 + +[ForwarderLevel - crcSalt or initCrcLength change may be required] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 12h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 53 2 * * * +description = Chance the alert requires action? Low. The forwarder is advising the crcSalt = or an initCrcLength change may be required on these files therefore these should be investigated. +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Look for issues relating to CRC salt on any files...the universal forwarder settings may need tweaking to ensure the file is read as expected, or it may be a rolled file``` \ +index=_internal "You may wish to use larger initCrcLen for this sourcetype" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) `splunkadmins_crcsalt_initcrc`\ +```Attempt to exclude rolled files from the check by looking for the most common pattern (.1, .2, .10 or similar) \ +This alert aims to find files where crcSalt = might be required in the inputs.conf file or a tweak to the initCrcLen...\ +Regex for filename now replaces the default field extraction due to Windows based filenames containing spaces..```\ +| rex "file=(?P.+)\)\."\ +| regex file!="\.\d+$" \ +| eval message=coalesce(message,event_message)\ +| top limit=500 file, host, message +disabled = 1 + +[ForwarderLevel - Splunk Universal Forwarders that are time shifting] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 0 0 * * * +description = Chance the alert requires action? Moderate. The clock has changed many times on this server and may indicate a timeshfiting test environment +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect universal forwarders that appear to be moving their clocks backwards/into the past or forwards, into the future. Timeshifting servers may need to be excluded from Splunk \ +The string "WARN TimeoutHeap - Either time adjusted forwards by, or event loop was descheduled for" looks similar but tends to relate to a poorly performing server rather than a time shift...```\ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) "Detected system time adjusted" OR "System time went " `splunkadmins_uf_timeshifting`\ +| rex "by (?P\d+)ms\.$"\ +| rex "by (?P[\d\.]+) seconds$"\ +| eval timePeriod=if(isnotnull(timePeriodInSecs),timePeriodInSecs*1000,timePeriod)\ +| where timePeriod > 100000 \ +| dedup host\ +| fields host, _raw +disabled = 1 + +[IndexerLevel - ClusterMaster Advising SearchOrRep Factor Not Met] +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 3h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = */10 * * * * +description = Chance the alert requires action? High. The cluster master shows that either not all data is searchable or rep/search factors are not met +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/cluster/searchhead/generation `splunkadmins_clustermaster_host` | where is_searchable!=1 OR replication_factor_met!=1 OR search_factor_met!=1 | table is_searchable replication_factor_met, search_factor_met\ +```If the cluster master advises there is an issue, you probably want to check why``` +disabled = 1 + +[IndexerLevel - Future Dated Events that appeared in the last week] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 0 * * 2 +description = Chance the alert requires action? High. Search for any data that has future based time-stamping, this likely shows a date parsing issue or a server sending logs with a date in the future +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Data should not appear from the future...this alert finds that data so it can be investigated. This query could be | tstats count, max(_time), min(_time) where index=* earliest=+5m latest=+10y groupby index, sourcetype, _time, _indextime span=1d, and then drop the field named 'ahead', but in 8.2.5 this is slower than the index= version``` \ +index=* earliest=+5m latest=+10y `splunkadmins_future_dated`\ +| eval ahead=abs(now() - _time)\ +| eval indextime=_indextime\ +| bin span=1d indextime \ +| eval timeToLookBack=now()-(60*60*24*7)\ +| stats avg(ahead) as averageahead, max(_time) AS maxTime, min(_time) as minTime, count, first(timeToLookBack) AS timeToLookBack by host, sourcetype, index, indextime\ +| where indextime>timeToLookBack AND averageahead > 1000\ +| eval averageahead =tostring(averageahead, "duration")\ +| eval invesMaxTime=if(minTime=maxTime,maxTime+1,maxTime)\ +| eval investigationQuery="index=" . index . " host=" . host . " sourcetype=\"" . sourcetype . "\" earliest=" . minTime . " latest=" . invesMaxTime . " _index_earliest=" . timeToLookBack . " | eval indextime=strftime(_indextime, \"%+\")"\ +| eval indextime=strftime(indextime, "%+"), maxTime = strftime(maxTime, "%+"), minTime = strftime(minTime, "%+")\ +| table host, sourcetype, index, averageahead, indextime, minTime, maxTime, count, investigationQuery +disabled = 1 + +[IndexerLevel - Failures To Parse Timestamp Correctly (excluding breaking issues)] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 3 * * 5 +description = Chance the alert requires action? Moderate. Failures to parse incoming log file timestamps, this excludes a timestamp failure due to the event been broken (there is a separate alert for breaking issues) +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Timestamp parsing has failed, and it doesn't appear to be related to the event been broken due to having too many lines, that is a separate alert that may trigger a timestamp parsing issue (excluded from this alert as that issue needs to be resolved first) \ +Please note that you may see this particular warning on data that is sent to the nullQueue using a transforms.conf. Obviously you won't see this in the index but you will see the warning because the time parsing occurs before the transforms.conf occurs\ +This alert now checks for at least 2 failures, and header entries can often trigger 2 entries in the log files about timestamp parsing failures...\ +Finally one strange edge case is a newline inserted into the log file (by itself with no content before/afterward) can trigger the warning but nothing will get indexed, multiline_event_extra_waittime, time_before_close and EVENT_BREAKER can resolve this edge case```\ +index=_internal sourcetype=splunkd ("Failed to parse timestamp" "Defaulting to timestamp of previous event") OR "Breaking event because limit of " OR "outside of the acceptable time window" (`splunkadmins_splunkd_source`) (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_failuretoparse_timestamp`\ +| bin _time span=`splunkadmins_failuretoparse_timestamp_binperiod` \ +| eval host=data_host, source=data_source, sourcetype=data_sourcetype\ +| rex "source::(?P[^|]+)\|host::(?P[^|]+)\|(?P[^|]+)" \ +| eventstats count(eval(isnotnull(data_host))) AS hasBrokenEventOrTuncatedLine, count(eval(searchmatch("outside of the acceptable time window"))) AS outsideTimewindow by _time, host, source, sourcetype\ +| where hasBrokenEventOrTuncatedLine=0 AND isnull(data_host) AND NOT searchmatch("outside of the acceptable time window")\ +```To investigate further we want the previous timestamp that Splunk used for the event in question, that way we can see what it looks like in raw format...```\ +| rex "Defaulting to timestamp of previous event \((?P[^)]+)"\ +| eval previousTimeStamp=strptime(previousTimeStamp, "%a %b %d %H:%M:%S %Y")\ +| stats count, min(_time) AS firstSeen, max(_time) AS mostRecent, first(previousTimeStamp) AS recentExample, sum(outsideTimewindow) AS outsideTimewindow by host, sourcetype, source\ +| where count>`splunkadmins_failuretoparse_timestamp_count`\ +| stats sum(count) AS count, min(firstSeen) AS firstSeen, max(mostRecent) AS mostRecent, first(recentExample) AS recentExample, values(source) AS sourceList, sum(outsideTimewindow) AS outsideTimewindow by host, sourcetype\ +| search ```Allow exclusions based on count or similar...``` `splunkadmins_failuretoparse_timestamp2`\ +| eval invesEnd=recentExample+1\ +| eval invesDataSource=sourceList\ +| eval invesDataSource=if(mvcount(invesDataSource)>1,mvjoin(invesDataSource,"\" OR source=\""),invesDataSource)\ +| eval invesDataSource = "source=\"" + invesDataSource + "\""\ +| eval invesDataSource = replace(invesDataSource, "\\\\", "\\\\\\\\")\ +| eval investigationQuery="```The investigation query may find zero data if the data was sent to the null queue by a transforms.conf as the time parsing occurs before the transforms occur. If this source/sourcetype has a null queue you may need to exclude it from this alert. Note that the host= can be inaccurate if host overrides are in use in transforms.conf, if this query finds no results remove host=...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" " . invesDataSource . " earliest=" . recentExample . " latest=" . invesEnd . " | eval indextime=strftime(_indextime, \"%+\")" \ +| eval mostRecent=strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+")\ +| eval outsideAcceptableTimeWindow=if(outsideTimewindow!=0,"Timestamp parsing failed due to been outside the acceptable time window","No")\ +| fields - recentExample, invesEnd, invesDataSource, outsideTimewindow\ +| sort - count +disabled = 1 + +[IndexerLevel - IndexConfig Warnings from Splunk indexers] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2 * * * +description = Chance the alert requires action? High. IndexConfig warnings are usually a problem so should be investigated... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```IndexConfig warnings are generally a problem``` \ +index=_internal "WARN IndexConfig" OR "ERROR IndexConfig" OR (ClusterMasterControlHandler " ERROR " OR " WARN " NOT "*No new dry run will be performed" ```This appears to occur when CM is not aware of a hot->warm roll, self-recovers in most cases as the .bucketManifest/directory is fine according to support``` NOT "The cluster manager already has committed size for this bucket") (`splunkadmins_splunkd_source`) `indexerhosts` OR `cluster_masters` `splunkadmins_indexconfig_warn` \ +| eval message=coalesce(message,event_message) \ +| stats count by message, host \ +| eval why="Bundle validation failure on indexing tier, please investigate" \ +| table why, message, count, host +disabled = 1 + +[IndexerLevel - Indexer Queues May Have Issues] +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 1h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = */11 * * * * +description = Chance the alert requires action? Low. One or more indexer queues have been filled for a period of time and may require investigation. +dispatch.earliest_time = -11m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This alert is borrowed from the monitoring console. When the queues are filled there is an issue in the indexer cluster!```\ +index=_internal `indexerhosts` `splunkadmins_metrics_source` sourcetype=splunkd group=queue \ +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none") | search ingest_pipe=*\ +| eval name=case(name=="aggqueue","2 - Aggregation Queue",\ + name=="indexqueue", "4 - Indexing Queue",\ + name=="parsingqueue", "1 - Parsing Queue",\ + name=="typingqueue", "3 - Typing Queue",\ + name=="splunktcpin", "0 - TCP In Queue",\ + name=="tcpin_cooked_pqueue", "0 - TCP In Queue") \ +| eval max=if(isnotnull(max_size_kb),max_size_kb,max_size) \ +| eval curr=if(isnotnull(current_size_kb),current_size_kb,current_size) \ +| eval fill_perc=round((curr/max)*100,2) \ +| eval combined = host . "_pipe_" . ingest_pipe\ +| bin _time span=1m\ +| stats Median(fill_perc) AS "fill_percentage" by combined, _time, name \ +| where (fill_percentage>`splunkadmins_indexerqueue_fillperc_nonindexqueue` AND name!="4 - Indexing Queue") OR (fill_percentage>`splunkadmins_indexerqueue_fillperc_indexqueue` AND name="4 - Indexing Queue") \ +| eventstats dc(combined) AS servercount \ +| eventstats count AS totalcount by combined, name \ +| where totalcount>`splunkadmins_indexerqueue_count` +disabled = 1 + +[IndexerLevel - Indexer replication queue issues to some peers] +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 90m +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = */11 * * * * +description = Chance the alert requires action? Low. Indexer replication queue issues to some peers may prevent indexing of data and result in a large index queue +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If the replication queue is full, then depending on the replication factor this can stop / slow indexing. Note this alert has been set to find many results to remove false alarms... \ +Unfortunately this setting is not tunable, at the time of writing (7.0.0) the queue size is 20. If the "has room now" appears shortly afterward this is not an issue.```\ +index=_internal `indexerhosts` "replication queue for " "full" sourcetype=splunkd (`splunkadmins_splunkd_source`)\ +| rename peer AS guid \ +| join guid [| rest /services/search/distributed/peers `splunkadmins_restmacro` | fields guid peerName]\ +| bin _time span=10m \ +| stats count by peerName, _time \ +| where count>`splunkadmins_indexer_replication_queue_count` +disabled = 1 + +[IndexerLevel - Rolling Hot Bucket Failure] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 5 +counttype = number of events +cron_schedule = 0,15,30,45 * * * * +description = Chance the alert requires action? High. Hot buckets are throwing errors while trying to roll +dispatch.earliest_time = -15m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If this alert fires, we are potentially out of disk in the hot section or something else has gone wrong``` \ +index=_internal `indexerhosts` "Not rolling hot buckets on further errors to this target" sourcetype=splunkd (`splunkadmins_splunkd_source`) +disabled = 1 + +[AllSplunkEnterpriseLevel - Losing Contact With Master Node] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 11 * * * * +description = Chance the alert requires action? Moderate. One or more splunk indexers have lost contact with the splunk cluster master server. This may require additional investigation. +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Either the manager (previously master) is down or the indexers are having issues contacting the manager. Alternatively the search head to manager connection is having issues``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) (`splunkenterprisehosts` CMSearchHead OR GenerationGrabber OR (CMMasterProxy down)) OR (`indexerhosts` cluster master CMSlave WARN OR ERROR)\ +NOT [`splunkadmins_shutdown_time(splunkadmins_clustermaster_oshost,30,60)`]\ +| fillnull master value="N/A"\ +| rex "(?s)^(\S+\s+){3}(?P.*)"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(host) AS hosts by error, master\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+")\ +| table hosts, count, master, firstseen, mostrecent, error\ +| where (match(error,"GenerationGrabber") AND count>10) OR (match(error, "(CMSearchHead|CMMasterProxy|CMSlave)") AND count>1) +disabled = 1 + +[IndexerLevel - Uneven Indexed Data Across The Indexers] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 56 1,5,9,13,17,21 * * * +description = Chance the alert requires action? Moderate. The data is not been spread across the indexers correctly during this last 4 hour block +dispatch.earliest_time = -4h@h +dispatch.latest_time = @h +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +display.visualizations.charting.chart.stackMode = stacked100 +enableSched = 1 +quantity = 10 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats summariesonly=t count WHERE index="*" by splunk_server _time span=10m\ +```If the balance of data between indexer cluster members becomes very unbalanced then the searches tend to spend more CPU on a particular indexer / search peer and this eventually creates issues```\ +| sort _time \ +| eventstats sum(count) AS totalCountForTime, dc(splunk_server) AS indexers by _time \ +| eval perc=round((count/totalCountForTime)*100,2) \ +| eval expectedShare = 100 / indexers\ +| eval perc = 100 - (expectedShare / perc)*100\ +| where perc>`splunkadmins_uneven_indexed_perc` +disabled = 1 + +[IndexerLevel - Weekly Broken Events Report] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 5 * * 4 +description = Chance the alert requires action? Moderate. These events are been broken due to reaching the maximum number of lines limit...in Splunk 7 and above the Monitoring Console, Indexing -> Inputs -> Data Quality will help here... +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The event that came in was greater than the maximum number of lines that were configured, therefore it was broken into multiple events... \ +If running Splunk 7 or newer than refer to the monitoring console Indexing -> Inputs -> Data Quality```\ +index=_internal AggregatorMiningProcessor "Breaking event because limit of" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkadmins_weekly_brokenevents`\ +| rex "Breaking event because limit of (?P\d+)" \ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, count by data_sourcetype, data_host, curlimit\ +| eval longerThan=curlimit-1\ +| eval invesLatest = if(mostRecent==firstSeen,mostRecent+1,mostRecent)\ +| rename data_sourcetype AS sourcetype, data_host AS host\ +| eval investigationQuery="```If no results are found prepend the earliest=/latest= with _index_ (eg _index_earliest=...) and expand the timeframe searched over, as the parsed timestamps from the data does not have to exactly match the time the warnings appeared...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" linecount>" . longerThan . " earliest=" . firstSeen . " latest=" . invesLatest\ +| fields - firstSeen, longerThan, invesLatest\ +| eval mostRecent=strftime(mostRecent, "%+")\ +| sort - count +disabled = 1 + +[IndexerLevel - Weekly Truncated Logs Report] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 5 * * 2 +description = Chance the alert requires action? Moderate. These events are been truncated due to hitting the truncation limit, in Splunk 7 and above the Monitoring Console, Indexing -> Inputs -> Data Quality will help here... +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The line was truncated due to length, the TRUNCATE setting may need tweaking (or it may be just bad data coming in)\ +Also refer to the Monitoring Console, Indexing -> Inputs -> Data Quality\ +If you are in a (very) performance sensitive environment you might want to remove the rex/eval lines for the data_host field and let the admin update the investigation query manually```\ +index=_internal "Truncating line because limit of" sourcetype=splunkd (`splunkadmins_splunkd_source`) (`heavyforwarderhosts`) OR (`indexerhosts`) `splunkadmins_weekly_truncated`\ +| rex "Truncating line because limit of (?P\d+) bytes.*with a line length >= (?P\S+)" \ +| rex field=data_host "(?P[^\.]+)"\ +| eval data_host=data_host . "*"\ +| stats min(_time) AS firstSeen, max(_time) AS lastSeen, count, avg(approxlinelength) AS avgApproxLineLength, max(approxlinelength) AS maxApproxLineLength, values(data_host) AS hosts by data_sourcetype, curlimit\ +| rename data_sourcetype AS sourcetype\ +| eval hostList=if(mvcount(hosts)>1,mvjoin(hosts," OR host="),hosts)\ +| eval hostList="host=" . hostList\ +| eval avgApproxLineLength = round(avgApproxLineLength)\ +| eval invesLastSeen=if(firstSeen==lastSeen,lastSeen+1,lastSeen)\ +| eval firstSeen=firstSeen-10\ +| eval invesLastSeen=invesLastSeen+10\ +| eval investigationQuery="```Find examples where the truncation limit has been reached. The earliest/latest time is based on the warning messages in the Splunk logs, they may need customisation!``` index=* sourcetype=" . sourcetype . " " . hostList . " earliest=" . firstSeen . " latest=" . invesLastSeen . " | where len(_raw)=" . curlimit\ +| sort - count\ +| eval lastSeen=strftime(lastSeen, "%+")\ +| table sourcetype, curlimit, count, avgApproxLineLength, maxApproxLineLength, lastSeen, investigationQuery\ +| where count>`splunkadmins_weekly_truncated_count` +disabled = 1 + +[IndexerLevel - Valid Timestamp Invalid Parsed Time] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 2 * * 3 +description = Chance the alert requires action? Moderate. The timestamp was parsed but an error was thrown to advise that the timestamp does not appear to be correct +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The timestamp parsing did run but the timestamp found did not match previous events so the time parsing may need a review```\ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) (`indexerhosts`) OR (`heavyforwarderhosts`) \ +"outside of the acceptable time window. If this timestamp is correct, consider adjusting" \ +OR "is too far away from the previous event's time" \ +OR "is suspiciously far away from the previous event's time" `splunkadmins_valid_timestamp_invalidparsed`\ +| rex "source::(?P[^|]+)\|host::(?P[^|]+)\|(?P[^|]+)"\ + ```The goal of this part of the search was to obtain the messages that are relating to this particular host/source/sourcetype, however since the message includes a time we cannot uses values(message) without getting a huge number of values, therefore we use cluster to obtain the unique values. Since we want the original start/end times we use labelonly=true```\ +| cluster labelonly=true \ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen, first(message) AS message by host, source, sourcetype, cluster_label\ +```While "A possible timestamp match (...) is outside of the acceptable time window" and "Time parsed (...) is too far away from the previous event's time", result in the current indexing time been used, the "Accepted time (...) is suspiciously far away from the previous event's time" is accepted and therefore we need to expand the investigation query time to include this time range as well!``` \ +| rex field=message "Accepted time \((?P[^\)]+)"\ +| eval acceptedTime=strptime(acceptedTime, "%a %b %d %H:%M:%S %Y")\ +| eval firstSeen=if(acceptedTime300\ +`splunkadmins_longrunning_searches`\ +```At this point we have a list of searches minus the various exclusions, we now filter out the real time searches as they will always run for a long period of time...```\ +| regex search_id!="rt.*" | table savedsearch_name, search_id, total_run_time, search_et, search_lt, api_et, api_lt, scan_count, _time, user, info, host | eval search_et=strftime(search_et, "%d/%m/%Y %H:%M"), search_lt=strftime(search_lt, "%d/%m/%Y %H:%M"), api_et=strftime(api_et, "%d/%m/%Y %H:%M"), api_lt=strftime(api_lt, "%d/%m/%Y %H:%M"), _time=strftime(_time, "%d/%m/%Y %H:%M") +disabled = 1 + +[SearchHeadLevel - Realtime Scheduled Searches are in use] +action.email.message.report = The scheduled report '$name$' has run. Please fix the searches listed +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 8 0,4,8,12,16,20 * * * +description = Chance the alert requires action? High. Realtime searches should not be scheduled, please only enable this alert if it relevant for your environment. Fields such as relation or alert_condition can be used if you want to look for realtime alerts only. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro` timeout=900 \ +| search ```Find realtime scheduled searches, they should not be enabled``` `splunkadmins_realtime_scheduledsearches`\ +| table title author, realtime_schedule, cron_schedule, description, disabled, dispatch.earliest_time, dispatch.index_earliest, dispatch.index_latest, dispatch.latest_time, dispatchAs, eai:acl.app, eai:acl.owner, eai:acl.owner, updated, qualifiedSearch, is_scheduled, next_scheduled_time, alert_type, schedule_priority\ +| search dispatch.earliest_time=rt* next_scheduled_time!="" +disabled = 1 + +[SearchHeadLevel - Scheduled Searches That Cannot Run] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 16 6,10,18 * * * +description = Chance the alert requires action? High. As found in the DMC console, moving it into an alert so we can get alerted to the problem rather than checking a dashboard/log about this. Can be fixed by the end user? Yes +dispatch.earliest_time = -8h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```These searches are scheduled but for some reason cannot run (eg. invalid search syntax)```\ +index=_internal `searchheadhosts` sourcetype=scheduler NOT "The 'require' command received zero" `splunkadmins_scheduledsearches_cannot_run`\ +```Additional rex due to someone using app= inside their saved search name...```\ +| rex "app=\"(?P[^\"]+)\""\ +| eval message=coalesce(message,event_message)\ +| fillnull message\ + ```The below 3 lines will catch map/lookup errors or similar that look like message="Error in 'map': Did not find value for required attribute 'attr'.". No actions executed. This may or may not be a good thing...``` \ +| rex "- savedsearch_id=\"(?P[^;]+);(?P[^;]+);(?P[^\"]+)" \ +| eval savedsearch_name=coalesce(savedsearch_name,savedsearch_name_2), app=coalesce(app,app2), user=coalesce(user,user2) \ +| fillnull status value="error" \ +| eval search_head_cluster=`search_head_cluster` \ +| eventstats values(user) AS user by savedsearch_name, app, search_head_cluster \ +| eval user=if(mvcount(user)>1,mvfilter(!(match(user, "nobody"))),user) \ +| stats max(_time) AS mostRecentlySeen, values(success) AS success by message, savedsearch_name, app, log_level, user, status, search_head_cluster \ +| stats count(eval(status="success")) AS successCount, count(eval(success==0)) AS reportFailureCount, count(eval(searchmatch("log_level=WARN OR log_level=ERROR OR status=delegated_remote_error"))) AS warnerrorcount, max(mostRecentlySeen) AS mostRecentlySeen, values(status) AS status, values(message) AS message by savedsearch_name, app, user, search_head_cluster\ +| where warnerrorcount>0\ +| append \ + [ search index=_internal `searchheadhosts` sourcetype=scheduler status=delegated_remote_error \ + | eval message=coalesce(message,event_message)\ + | eval search_head_cluster=`search_head_cluster` \ + | stats max(_time) AS mostRecentlySeen, first(message) AS message by savedsearch_name, app, log_level, user, status, search_head_cluster] \ +| selfjoin overwrite=true keepsingle=true savedsearch_name, app, user, search_head_cluster \ +| append \ + [ search ```macro failures in the search syntax result in the log only appearing in splunkd, and the absence of delegated_remote_completion in scheduler.log``` \ + index=_internal `searchheadhosts` ERROR "failed job" sourcetype=splunkd `splunkadmins_splunkd_source` saved_search=* \ + | search ```Exclude time periods where shutdowns were occurring``` NOT \ + [ `splunkadmins_shutdown_time(searchheadhosts,0,0)`] \ + | rex "saved_search=([^;]+);(?P[^;]+);(?P.*?) err=" \ + | rex "(?Psaved_search=.*uri=)http(s)?://[^/]+(?P.*)" \ + | eval messagewithoutheader=messagewithoutheader . messagewithoutheader2 \ + | eval message=coalesce(message,event_message)\ + | eval search_head_cluster=`search_head_cluster` \ + | stats max(_time) AS mostRecentlySeen, first(message) AS message by messagewithoutheader, savedsearch_name, app, search_head_cluster \ + | eval log_level="ERROR" \ + | fields - messagewithoutheader \ + | sort - mostRecentlySeen] \ +| selfjoin overwrite=true keepsingle=true savedsearch_name, app, search_head_cluster \ +| where successCount<1 \ +| sort - warnerrorcount, savedsearch_name \ +| rename message as Message, count as runCount \ +| eval mostRecentlySeen = strftime(mostRecentlySeen, "%+") \ +| fields - cluster_label, status, savedsearch_id, host, status +disabled = 1 + +[SearchHeadLevel - Scheduled Searches without a configured earliest and latest time] +action.email.message.report = The scheduled report '$name$' has run. Please fix the searches listed +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 8h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 8 0,4,8,12,16,20 * * * +description = Chance the alert requires action? High. A scheduled search without time limits could kill the Splunk indexers with CPU / IO issues depending on the criteria of the search. Can be fixed by the end user? Yes. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro` count=0 timeout=900 search="disabled=0" search="is_scheduled=1" f=next_scheduled_time f=title f=qualifiedSearch f=eai:* f=dispatch* f=cron_schedule f=realtime_schedule f=is_visible \ +| search ```Find scheduled searches where they are searching over all time, this is generally not good practice and can cause performance issues``` \ +dispatch.earliest_time="" OR dispatch.earliest_time="0" next_scheduled_time!="" `splunkadmins_scheduledsearches_without_earliestlatest` \ +| table title author, realtime_schedule, cron_schedule, description, disabled, dispatch.earliest_time, dispatch.latest_time, dispatch.index_earliest, dispatch.index_latest, eai:acl.app, eai:acl.owner, updated, qualifiedSearch, is_scheduled, is_visible, next_scheduled_time, splunk_server \ +| regex qualifiedSearch="^\s*(search|tstats) " \ +| rex field=qualifiedSearch "earliest=(?P\S+)" \ +| where isnull(earliestTime) \ +| fields - earliestTime \ +| rename eai:acl.owner AS owner, eai:acl.app AS app +disabled = 1 + +[SearchHeadLevel - Scheduled searches not specifying an index] +action.analyzeioc.param.verbose = 0 +action.email.message.report = The scheduled report '$name$' has run. Please fix the searches listed +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 14 6 * * 1-5 +description = Chance the alert requires action? High. These searches are either using index=* or not specifying an index at all and relying on the default set of indexes. Can be fixed by the end user? Yes. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro`\ + ```Look over all scheduled searches and find those not specifying/narrowing down to an index, or using the index=* trick```\ +| table title, eai:acl.owner, description, eai:acl.app, qualifiedSearch, next_scheduled_time\ +| search next_scheduled_time!="" `splunkadmins_scheduledsearches_without_index` \ +| regex qualifiedSearch!=".*index\s*(!?)=\s*([^*]|\*\S+)" \ +| regex qualifiedSearch="^\s*search "\ +| regex qualifiedSearch!="^\s*search\s*\[\s*\|\s*inputlookup"\ +| rex field=qualifiedSearch "(?s)^(?P[^\|]+)"\ +| regex exampleQueryToDetermineIndexes!="\`"\ +| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"\ +| rename eai:acl.owner AS owner, eai:acl.app AS Application +disabled = 1 + +[SearchHeadLevel - Script failures in the last day] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 22 * * * +description = Chance the alert requires action? Moderate. Scripts or webhooks are throwing errors which may indicate an issue, requires "SearchHeadLevel - RMD5 to savedsearch_name lookupgen report" to translate the search name accurately. +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Shell scripts running from Splunk are failing to run or throwing errors, or another sendmodalert failure has occurred``` \ +index=_internal `searchheadhosts` sourcetype=splunkd log_level="ERROR" OR log_level="WARN" `splunkadmins_scriptfailures` command="runshellscript" OR ScriptRunner OR "Alert script returned error code" OR "ERROR sendmodalert" OR "WARN sendmodalert" OR "Killing script" OR ("ERROR SearchScheduler" "sendalert") NOT sendemail.py NOT "InsecureRequestWarning" \ +| bin _time span=30s \ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +``` I'm unsure when the extra id appeared in here but it adds more context than threadid alone and reduces duplicates. Visible in 9.1.3``` \ +| rex "^[^\[]+\[(?P\S+)\s+(?P[^\]]+)" \ +| fillnull threadid other_id value="N/A" \ +| stats values(_raw) AS _raw, values(search_head) AS search_head, values(search_head_cluster) AS search_head_cluster by _time, threadid, host, other_id \ +| stats values(_raw) AS _raw, values(search_head) AS search_head, values(search_head_cluster) AS search_head_cluster by _time, threadid, host \ +| search _raw!="*404: Not Found" _raw!="*Connection refused>" _raw!="*HTTP Error 403: Forbidden" _raw!="*Name or service not known>" NOT (_raw="*Connection reset by peer" _raw="*sendalert' command*") \ +| rex field=_raw "[/\\\]dispatch[/\\\](?P[^/]+)" \ +| rex "sid:(?P\S+)" \ +| eval sid=coalesce(sid2,sid_from_dispatch) \ +| eval sid=mvdedup(sid) \ +| append \ + [ search index=_internal `searchheadhosts` sourcetype=scheduler WARN SavedSplunker maximum time allowed \ + | eval search_head=host \ + | eval search_head_cluster=`search_head_cluster` \ + | stats count, last(event_message) AS event_message, last(component) AS component, latest(_time) AS _time, values(search_head) AS search_head, values(search_head_cluster) AS search_head_cluster by savedsearch_id \ + | rex field=savedsearch_id "^(?P[^;]+);(?P[^;]+);(?P.*)" \ + | eval _raw = "count=" . count . " " . component . " " . event_message \ + | rex field=_raw "sid=\"(?P[^\"]+)" ] \ +| `search_type_from_sid(sid)` \ +| lookup splunkadmins_rmd5_to_savedsearchname RMDvalue AS report OUTPUT savedsearch_name\ +| eval searchname=coalesce(savedsearch_name, searchname) \ +| `base64decode(base64appname)` \ +| eval app4="N/A" \ +| eval app=coalesce(app,base64appname,app3,app4) \ +| eval _raw=mvindex(_raw,0,20), searchname=mvindex(searchname,0,20) \ +| `base64decode(base64username)` \ +| eval username=coalesce(username,base64username,username3) \ +| fillnull value="N/A" app, searchname, username, search_head_cluster \ +| stats count, values(_raw) AS _raw by app, searchname, username, search_head_cluster, _time \ +| table _time, username, app, searchname, search_head_cluster, _raw +disabled = 1 + +[SearchHeadLevel - Splunk Max Historic Search Limits Reached] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 11 * * * +description = Chance the alert requires action? Moderate. Splunk Max Historic Search Limits Reached +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Once the max historic search limit has been reached the search jobs will be queued, this can be an issue if the limit is set too low (or too high)``` \ +index=_internal `splunkenterprisehosts` "The maximum number of historical concurrent system-wide searches has been reached" OR "The system is approaching the maximum number of historical searches that can be run concurrently" (`splunkadmins_splunkd_source`) \ +| fields _time, host +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunk Scheduler skipped searches and the reason] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2,6,10,14,18,22 * * * +description = Chance the alert requires action? Low. Provides the skipped searches and a list of reasons why they were skipped. To remove false alarms this alert now checks if any shutdown messages appear, this may require tweaking in your environment as it checks for *any* Splunk enterprise instance shutdown... +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```For some reason this search had to be skipped, this might be due to over scheduling the search or an inefficient search or similar``` index=_internal sourcetype=scheduler status=skipped source=*scheduler.log `splunkenterprisehosts` \ +| search ```Exclude unable to distribute to peer messages where we sent the shutdown signal to the peer``` NOT [`splunkadmins_shutdown_list(splunkenterprisehosts,0,0)`]\ +| search ```Skipped searches can be expected for up-to 10 minutes after an indexer(s) has been shutdown...``` NOT [`splunkadmins_shutdown_time(indexerhosts,0,600)`]\ +| fillnull concurrency_category concurrency_context concurrency_limit\ +| stats count, earliest(_time) AS firstSeen, latest(_time) AS lastSeen by savedsearch_id, reason, app, concurrency_category, concurrency_context, concurrency_limit, search_type, user, host \ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[SearchHeadLevel - Splunk Users Violating the Search Quota] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 0 3 * * * +description = Chance the alert requires action? Low. These users have reached the search quota but may not be aware of this issue. +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The listed users have reached the search quota and may need to be informed of this, or they may need to be added to the macro for this alert```\ +index=_internal `searchheadhosts` (`splunkadmins_splunkd_source`) "was previously reported to be hung but has completed" OR "The maximum number of concurrent " `splunkadmins_users_violating_searchquota`\ +| rex "Queued job id\s+=\s+(rt_)?(?P[^_]+)"\ +| eval username=coalesce(username, username2)\ +| bin span=24h _time\ +| top showperc=false limit=200 username, reason, host, _time, provenance\ +| where count>10 +disabled = 1 + +[SearchHeadLevel - Users exceeding the disk quota] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 21 */2 * * * +description = Chance the alert requires action? High. One or more users have reached the disk quota limit and may not be aware of this... Can be fixed by the end user? Yes. You may wish to use sendresults with the output of this command...Search Head specific? Yes +dispatch.earliest_time = -2h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The listed users have reached the maximum disk quota, they may be unaware so it is best to let them know about this issue... \ +Note that the REST API call accesses the jobs list which can expire for ad-hoc jobs in 10 minutes, so this may find zero results. The status.csv inside the dispatch directory records the size per job but it is not indexed by Splunk so either this alert needs to run very often or it will sometimes run after the issue has occurred and send an empty top 10 jobs list...\ +The introspection index version is called SearchHeadLevel - Users exceeding the disk quota introspection, it is not search head specific but it is also less accurate for disk space used```\ +index=_internal sourcetype=splunkd `splunkenterprisehosts` (`splunkadmins_splunkd_source`) "maximum disk usage quota" `splunkadmins_users_exceeding_diskquota`\ +| stats max(_time) AS mostRecent by username, reason, host\ +| eval mostRecent = strftime(mostRecent, "%+")\ + ```We use this bizarre field naming so when we append the actual search results we don't have 20 columns of data to read, also it looks nicer in an email. However since we want this over multiple lines and we only want to run the map command once we use a temporary field which we later expand to a multi-line field. Furthermore mvexpand on the search field can result in multiple rows per search which is why a temporary field is used```\ +| eval renameToSearch="Why am I, " + username + ", receiving this? |" + reason + " (from) " + host + "|_|Last seen? |" + mostRecent + "|_|Your top 10 largest jobs are listed below"\ +| fields - reason, mostRecent, host\ + ```The below is the complex attempt to include the largest jobs by querying the REST API. If we use map without the appendpipe we lose the original reason why we are sending this email. The initial workaround of makeresults and eval commands did work but this seemed slightly cleaner. Although there would be other ways to do this...```\ +| append [ | makeresults | eval username="workaround for map errors", body="to pass appinspect" ]\ +| appendpipe\ + [\ +| map \ + [| rest /services/search/jobs `splunkadmins_restmacro` \ + | search ```Attempt to show the customer the top 10 jobs using disk and the related search commands/search names, also if it relates to their scheduled searches or not...``` author=$username$ diskUsage>0 \ + | fields diskUsage, eai:acl.app, latestTime, label, provenance, runDuration, searchEarliestTime, searchLatestTime, title, updated, ttl \ + | rename title AS search, eai:acl.app AS app, label AS searchName \ + | sort - diskUsage \ + | eval diskUsage=round(diskUsage/1024/1024,2), searchEarliestTime=strftime(searchEarliestTime, "%+"), searchLatestTime=strftime(searchLatestTime, "%+") \ + | eval expiry=strftime(strptime(updated, "%Y-%m-%dT%H:%M:%S.%3N%z")+ttl, "%+")\ + | eval runDuration=substr(tostring(runDuration,"duration"),0,8) \ + | eval search=substr(search,0,300) \ + | fields - provenance, ttl, updated \ + | eval searchName=if(searchName=="","ad-hoc search",searchName)\ + | eval renameToSearch="X"\ + | table searchName, app, diskUsage, expiry, runDuration, searchEarliestTime, searchLatestTime, search, renameToSearch\ + | head 10 ] \ +]\ +| where username!="workaround for map errors"\ +| makemv delim="|" renameToSearch\ +| mvexpand renameToSearch\ +| eval search=if(renameToSearch!="X",renameToSearch,search)\ +| table username, searchName, app, diskUsage, expiry, runDuration, searchEarliestTime, searchLatestTime, search +disabled = 1 + +[AllSplunkLevel execprocessor errors] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 28 5 * * * +description = Chance the alert requires action? Low. This alert can be very noisy, this will return any execprocessor errors from any script on any Splunk server! +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```An attempt to find the execprocessor errors these are often scripts or application which are having some kind of issue...```\ +index=_internal "ERROR ExecProcessor" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) NOT "Ignoring: \"" `splunkadmins_execprocessor`\ +| eval message=coalesce(message,event_message)\ +| dedup message, host | fields host _raw +disabled = 1 + +[IndexerLevel - Time format has changed multiple log types in one sourcetype] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 2 * * 1 +description = Chance the alert requires action? High. A changing time format is likely due to multiple log types using the same sourcetype or a date time parsing issue +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This search detects when the time format has changed within the files 1 or more times, the time format per sourcetype should be consistent```\ +index=_internal DateParserVerbose "Accepted time format has changed" sourcetype=splunkd (`splunkadmins_splunkd_source`) (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_timeformat_change`\ +| rex "source(?:=|::)(?[^\|]+)\|host(?:=|::)(?[^\|]+)\|(?[^\|]+)"\ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen by host, source, sourcetype, message\ +| eval invesMaxTime=if(firstSeen=lastSeen,lastSeen+1,lastSeen)\ +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\")\ +| eval potentialInvestigationQuery="```If no results are found, prepend the earliest=/latest= with _index_ (eg _index_earliest=...) and expand the timeframe searched over, as the parsed timestamps from the data does not have to exactly match the time the warnings appeared...``` sourcetype=\"" . sourcetype . "\" source=\"" . invesDataSource . "\" host=" . host . " earliest=" . firstSeen . " latest=" . invesMaxTime . " | eval start=substr(_raw, 0, 30) | cluster field=start"\ +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+")\ +| fields - invesMaxTime, invesDataSource\ +| sort - count +disabled = 1 + +[IndexerLevel - Volume (Cold) Has Been Exceeded] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 54 1,5,9,13,17,21 * * * +description = Chance the alert requires action? High. The non-hot volume has been exceeded therefore we are deleting data before the time limit is hit... +dispatch.earliest_time = -5h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The cold volume is causing indexes to be trimmed, this may or may not be an issue...``` \ +index=_internal `indexerhosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) "Size exceeds max, will have to trim " volume!="hot"\ +| fields host, _raw +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunk Scheduler excessive delays in executing search] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 6 * * 2 +description = Chance the alert requires action? Moderate. Long latency delays in scheduled searches may indicate an issue, however the scheduled time of the search is what determine the search window. Therefore this only shows when it has taken a long period of time to execute an actual search (there is another alert for skipped searches) +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```These searches were scheduled to run at a particular time but the actual run time was more than X seconds later, which might indicate a search head performance issue \ +In the environment this search was created on restarts are not overly common so we assume any restart might relate to a scheduling delay to prevent false alarms from the alert. This may need further tuning or you may wish to remove the where clause in this if you want more alerting.```\ +index=_internal `splunkenterprisehosts` sourcetype=scheduler app=* scheduled_time=* source=*scheduler.log\ +| search ```Exclude time periods where shutdowns were occurring``` NOT [`splunkadmins_shutdown_list(splunkenterprisehosts,600,600)`]\ +| eval time=strftime(_time,"%+") \ +| eval delay_in_start = (dispatch_time - scheduled_time) \ +| where delay_in_start>100\ +| eval scheduled_time=strftime(scheduled_time,"%+") \ +| eval dispatch_time=strftime(dispatch_time,"%+") \ +| rename time AS endTime \ +| table host,savedsearch_name,delay_in_start, scheduled_time, dispatch_time, endTime, run_time, status, user, app \ +| sort -delay_in_start \ +| dedup host,savedsearch_name,delay_in_start +disabled = 1 + +[SearchHeadLevel - Splunk login attempts from users that do not have any LDAP roles] +action.analyzeioc.param.verbose = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 0 8 * * * +description = Chance the alert requires action? High. These usernames have appeared in the logs but they have no mapped roles. Can be fixed by the end user? Yes +dispatch.earliest_time = -1d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The listed users have attempted to login but were unable to, it is likely they do not have any LDAP role yet and should be informed of this``` \ +index=_internal `searchheadhosts` "Couldn't find matching groups for user" OR "but none are mapped to Splunk roles" OR "SSO failed - User does not exist" (`splunkadmins_splunkd_source`) `splunkadmins_loginattempts`\ +| join user [search index=_internal `searchheadhosts` action=login status=failure reason=user-initiated OR reason=sso-failed] \ +| dedup user \ +| table _time, user, host +disabled = 1 + +[IndexerLevel - Buckets have being frozen due to index sizing] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 33 3,7,11,15,19,23 * * * +description = Chance the alert requires action? High. One or more indexes have hit the index size limit and are now been frozen due to this. Note this won't work for SmartStore based indexers, refer to the alert IndexerLevel - Buckets have being frozen due to index sizing SmartStore +dispatch.earliest_time = -5h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The indexer is freezing buckets due to disk space pressure before the frozenTimePeriodInSecs limit has been reached, this could be a problem if it is not expected...\ +_introspection defaults to size based so exclude it``` \ +index=_internal `indexerhosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) BucketMover "will attempt to freeze" NOT "because frozenTimePeriodInSecs=" \ +`splunkadmins_bucketfrozen`\ +| rex field=bkt "(rb_|db_)(?P\d+)_(?P\d+)"\ +| eval newestDataInBucket=strftime(newestDataInBucket, "%+"), oldestDataInBucket = strftime(oldestDataInBucket, "%+") \ +| eval message=coalesce(message,event_message)\ +| table message, oldestDataInBucket, newestDataInBucket +disabled = 1 + +[IndexerLevel - Indexer Out Of Disk Space] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 5 +counttype = number of events +cron_schedule = 4,19,34,49 * * * * +description = Chance the alert requires action? High. The indexer has run out of disk space while attempting to write to the filesystem +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The indexer has run out of disk space, this requires immediate investigation...``` \ +index=_internal "event=onFileWritten err=\"disk out of space\"" OR "event=replicationData status=failed err=\"onFileWritten failed\"" `indexerhosts` (`splunkadmins_splunkd_source`) sourcetype=splunkd \ +| top host +disabled = 1 + +[AllSplunkEnterpriseLevel - Core Dumps Disabled] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 27 7 * * * +description = Chance the alert requires action? Moderate. Core Dumps are disabled and this may make support cases more difficult as sometimes the core dump is required for troubleshooting purposes. This is replaced by MonitoringConsole - Check OS ulimits via REST if you would like to use REST instead... +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Core dumps are disabled, if a crash occurs the Splunk Support team might not be able to assist without the core dump\ +https://answers.splunk.com/answers/223838/why-are-my-ulimits-settings-not-being-respected-on.html applies to core limits so if the server has been rebooted the init.d may need a ulimit -Hc/Sc setting for this as well... Systemd controlled Splunk instances have limits within the unit file```\ +index=_internal "WARN ulimit" "Core file generation disabled" `splunkenterprisehosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) \ +| stats max(_time) AS mostRecentlySeen by host\ +| eval mostRecentlySeen = strftime(mostRecentlySeen, "%+") +disabled = 1 + +[ForwarderLevel - Splunk Insufficient Permissions to Read Files] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 51 6 * * * +description = Chance the alert requires action? Low. An insufficient permissions to read files error was thrown... +dispatch.earliest_time = -1d@d +dispatch.latest_time = @d +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.mode = fast +display.page.search.patterns.sensitivity = 0.3 +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats count groupby host, source\ +| append \ + [ search\ + ```This search looks for insufficient permissions errors, the problem here is that we might have insufficient permissions to read a file but we might later obtain the correct permissions and then read the file (as permissions changes can happen *after* the file creation...this is why there is both a tstats listing all files (only done because I cannot find a nicer way to do this, map is possibly more compute intensive), and then a search for files```\ + index=_internal "Insufficient permissions to read file" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`)\ + | rex "\(hint: (?P[^\)]+)"\ + | stats min(_time) AS firstSeen, max(_time) AS mostRecent, values(hint) AS hint by file, host \ + | rex field=file "'(?P[^']+)'" \ + | eval insufficientpermissions="true" \ + | fields firstSeen, mostRecent, source, host, insufficientpermissions, hint] \ +| search ```Ignore any files requested by the macro, i.e. source!= or host!= or similar...``` `splunkadmins_permissions`\ +| stats sum(count) AS count, min(firstSeen) AS firstSeen, max(mostRecent) AS mostRecent, values(insufficientpermissions) AS insufficientpermissions, values(hint) AS hint by host, source \ +| search ```If we have an insufficient permissions error, did we see no data from our tstats command?``` insufficientpermissions="true" NOT count=* `splunkadmins_insufficient_permissions`\ + ```At this point if we see an insufficient permissions line, and we cannot see a result from the tstats showing indexed data from that file, then we have an issue, if not there is no issue with permisisons! \ + Insufficient permissions to read file + hint: No such file or directory when the file exists on a Splunk enterprise instance might require TAILING_SKIP_READ_CHECK = 1 in the splunk-launch.conf refer to splunk support for more info``` \ +| eval invesSource=replace(source, "\\\\", "\\\\\\\\") \ +| addinfo \ +| eval investigationQuery="index=_internal \"Insufficient permissions to read file\" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) earliest=" . info_min_time . " latest=" . info_max_time . " host=" . host . " file=\"'" . invesSource . "'\""\ +| eval firstSeen=strftime(firstSeen, "%+"), mostRecent=strftime(mostRecent, "%+") \ +| fields host, source, firstSeen, mostRecent, hint, investigationQuery +disabled = 1 + +[AllSplunkLevel - TCP Output Processor has paused the data flow] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 44 * * * * +description = Chance the alert requires action? Low. A potential indicator of poor index performance or an overloaded forwarder +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A paused TCP output processor is a potential indicator of an index performance issue, you may wish to ignore the shorter pause times such as 10 seconds if this is creating too many alerts...\ +On the indexer side you may see 'WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than...' OR 'WARN TcpInputProc - Started listening on tcp ports. Queues unblocked', if there are indexer performance issues...```\ +index=_internal "paused" "data flow" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) `splunkadmins_tcpoutput_paused`\ +| search ```Exclude shutdown times``` NOT [`splunkadmins_shutdown_time(indexerhosts,60,60)`]\ +| rex "has been blocked for (blocked_seconds=)?(?P\d+)"\ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen, first(message) AS message, max(timeperiod) AS maxInSeconds, avg(timeperiod) AS avgTimePeriod by host\ +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+"), avgTimePeriod=round(avgTimePeriod) +disabled = 1 + +[IndexerLevel - These Indexes Are Approaching The warmDBCount limit] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 48 6 * * 1 +description = Chance the alert requires action? Moderate. Buckets are either now rolling or will roll to cold due to the bucket count limit in warm been reached, this may need ajustment +dispatch.earliest_time = -90d@d +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | dbinspect index=* state=warm\ +| search ```Once the warmdb bucket count is reached then the buckets are moved to cold, this may be an issue if incorrectly configured, this alert warns in advance if we get close to the limit\ +This might be a bug in 6.5.2 but the buckets are printed twice by dbinspect in some cases...``` `splunkadmins_warmdbcount`\ +| dedup bucketId, splunk_server\ +| stats count AS theCount by index, splunk_server\ +| stats avg(theCount) AS averageCount, max(theCount) AS maxCount, min(theCount) AS minCount, values(splunk_server) by index\ +| eval averageCount = round(averageCount)\ +| join index [| rest /services/data/indexes datatype=all \ + | dedup title \ + | rename title AS index \ + | table index, maxWarmDBCount]\ +| eval percUsed = (100/maxWarmDBCount)*averageCount\ +| where percUsed > `splunkadmins_warmdbcount_perc` +disabled = 1 + +[ForwarderLevel - SplunkStream Errors] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 9 11 * * * +description = Chance the alert requires action? Moderate. Errors from the Splunk stream forwarders will normally require an action. Note that this search assumes your search heads are the ones hosting the stream application...you may need to customise this. +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal source="*streamfwd.log" ERROR OR FATAL `splunkadmins_streamerrors`\ +| search ```Exclude time periods where shutdowns were occurring``` NOT [`splunkadmins_shutdown_time(searchheadhosts,0,0)`]\ +| cluster showcount=true\ +| fields host, _raw +disabled = 1 + +[SearchHeadLevel - LDAP users have been disabled or left the company cleanup required] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 33 11 * * * +description = Chance the alert requires action? High. These users have been disabled or left the company but their users files are on the filesystem and this is therefore triggering warning or errors in the Splunk logs, please cleanup the old user files for these users.\ +A separate alert should exist for orphaned searches... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If we see a failed to get LDAP user 'username' from any configured servers then that is a sign the user is no longer in the company. However if there is also a message of Couldn't find matching groups for that same user it is more likely that they exist but just do not have access to Splunk\ +If you see this alert fire, than you probably need to cleanup the (for example) /opt/splunk/etc/users/... directory on each search head due to a user leaving/becoming disabled in LDAP. Alternatively they have a savedsearch/dashboard that you can find in the .meta files on the search head(s)```\ +index=_internal `searchheadhosts` "Failed to get LDAP user=\"" OR "Couldn't find matching groups for user=" OR (HTTPAuthManager "SSO failed - User does not exist") sourcetype=splunkd (`splunkadmins_splunkd_source`)\ +| eval message=coalesce(message,event_message)\ +| dedup message \ +| rex "SSO failed - User does not exist: (?P\S+)"\ +| stats count, values(message) AS messages, values(component), AS components values(log_level), max(_time) AS lastSeen by user, host\ + ```count=1 eliminates users who are failing to login...if a user is active in LDAP but fails to login we should not not get a "Couldn't find matching groups for user" line in the logs\ +If we are using a single sign on system and a user without any groups attempts sign on we should see the "SSO failed - User does not exist: message"```\ +| where user!="undefined" AND user!="nobody" AND like(messages,"Failed to get LDAP user%") AND NOT like(messages,"SSO failed - User does not exist%")\ +| table user, messages, lastSeen, host\ +| eval lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[AllSplunkLevel - DeploymentServer Application Installation Error] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 11 * * * * +description = Chance the alert requires action? High. The deployment server sent out a new application but for some reason it has failed to install +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Failed to install app can appear *if* the application is installed to a particular targetRepositoryLocation (for example /opt/splunk/etc/deployment-apps or similar) and stateOnClient=noop is not added to the application within serverclass.conf```\ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) ("ERROR DeployedServerclass" "name=* Failed to install") OR (DeployedApplication "Installing app=")\ +| eventstats count(eval(log_level="ERROR")) AS errorCount, count(eval(log_level="INFO")) AS successCount by host, app \ +| where errorCount>0 AND successCount<1 +disabled = 1 + +[AllSplunkLevel - Unable To Distribute to Peer] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 9,24,39,54 * * * * +description = Chance the alert requires action? Low. A Splunk instance is advising that it cannot distribute to a peer node (indexer, another search head in the cluster or similar) +dispatch.earliest_time = -15m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Unable to distribute to peer messages often indicate downtime or serious performance issues. The Unable to distribute to peer named status=Down scenario can also result from having many indexers and this may require an increase to the timeouts in distsearch.conf```\ +index=_internal "Unable to distribute to peer named" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkenterprisehosts` `splunkadmins_unable_distribute_to_peer`\ +| rex "(?PUnable to distribute to peer named (?P[^: ]+))"\ +| bin _time span=1m\ +| join type=outer peer\ + [ rest /services/search/distributed/peers \ + | fields peerName, title\ + | rex field=title "(?P[^:]+)"\ + | rename title AS peer ]\ +| eval targetHost=if(isnotnull(peerName),peerName,peer)\ +| search ```Exclude unable to distribute to peer messages where we sent the shutdown signal to the peer``` NOT [`splunkadmins_shutdown_list(splunkenterprisehosts,0,0)`]\ +| stats count, values(message) AS message, values(host) AS reportingHostList by _time, targetHost \ +| eval reportingHostList=mvindex(reportingHostList,0,9)\ +| sort - _time\ +| where count>1 +disabled = 1 + +[SearchHeadLevel - Alerts that have not fired an action in X days] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. This report can be run to determine which alerts have not sent an alert based on the time period / amount of internal logs available...Search Head specific? Yes +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find alerts that are scheduled but not firing any actions, the alerts may need further review or may no longer be required. The app regex is in here because of some creative alert naming, X:app=Y is a real alert name in my environment!```\ +index=_internal source="*scheduler.log" sourcetype=scheduler `searchheadhosts` alert_actions!="" \ +| rex ", app=\"(?P<app>[^\"]+)\","\ +| stats count by savedsearch_name, app \ +| append \ + [| rest `splunkadmins_restmacro` /servicesNS/-/-/saved/searches \ + | search actions!="summary_index" actions!="" next_scheduled_time!="" search!="| noop" \ + | table eai:acl.app, title \ + | eval fromRESTQuery=""\ + | rename title as savedsearch_name, eai:acl.app as app ]\ +| eventstats count(eval(isnotnull(fromRESTQuery))) AS restCount, count by savedsearch_name, app\ +| where restCount=1 AND count=1\ +| table savedsearch_name, app + +[SearchHeadLevel - Data Model Acceleration Completion Status] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. The % complete of the data model which is stored on the indexer level but run from the search head level...refer to the data model dashboards for more detailed information. Search Head specific? Yes +dispatch.earliest_time = @d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/admin/summarization by_tstats=t `splunkadmins_restmacro` count=0 \ + ```Found on https://answers.splunk.com/answers/555005/how-to-check-the-percent-of-the-dm-acceleration-co.html ```\ +| eval datamodel=replace('summary.id',"DM_".'eai:acl.app'."_","") \ +| join type=left datamodel \ + [| rest /services/data/models `splunkadmins_restmacro` count=0 \ + | table title acceleration.cron_schedule eai:digest \ + | rename title as datamodel \ + | rename acceleration.cron_schedule AS cron] \ +| table datamodel eai:acl.app summary.access_time summary.is_inprogress summary.size summary.latest_time summary.complete summary.buckets_size summary.buckets cron summary.last_error summary.time_range summary.id summary.mod_time eai:digest summary.earliest_time summary.last_sid summary.access_count \ +| rename summary.id AS summary_id, summary.time_range AS retention, summary.earliest_time as earliest, summary.latest_time as latest, eai:digest as digest \ +| rename summary.* AS *, eai:acl.* AS * \ +| sort datamodel + +[SearchHeadLevel - User - Dashboards searching all indexes] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 30 5 * * 1-5 +description = Chance the alert requires action? High. All dashboard panels that do not have an index= setting or use index=* are highlighted by this alert. Can be fixed by the end user? Yes. Search Head specific? Yes +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/ui/views `splunkadmins_restmacro`\ +| search ```A dashboard searching all indexes is an issue just like a scheduled search querying all indexes or using the index=* trick```\ +eai:data=*query* `splunkadmins_dashboards_allindexes`\ +| regex eai:data="<search.*" \ +| rex field=eai:data "(?s)(?P<theSearch><search(?!String)[^>]*>[^<]*<query>.*?)<\/query>" max_match=200 \ +| mvexpand theSearch \ +| rex field=theSearch "(?s)<search(?P<searchInfo>[^>]*)>[^<]*<query>(?P<theQuery>.*)" \ +| search ```If we are seeing post process search then we don't want to check if it has index= because that is likely only in the base query. These are also various exclusions for legitimate searches that will not involve scanning all indexes, such as rest or a savedsearch or similar``` searchInfo!="*base*"\ +| rename eai:appName AS application, eai:acl.sharing AS sharing, eai:acl.owner AS owner, label AS name\ +| table theQuery, application, owner, sharing, name, splunk_server, title\ +| regex theQuery!="index\s*=(?!\s*\*)" \ +| regex theQuery!="^(\()?\s*(\`|\$[^|]+\$|eventtype=|<!\[CDATA\[\s*\|\s*((acl)?inputlookup|rest) |\|)"\ +| rex field=theQuery "(?s)^(?P<exampleQueryToDetermineIndexes>[^\|]+)"\ +| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)" +disabled = 1 + +[SearchHeadLevel - Scheduled Searches Configured with incorrect sharing] +action.email.message.report = The scheduled report '$name$' has run. Please fix the searches listed +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 5 * * * +description = Chance the alert requires action? High. These searches are triggering scripts or alerts which will provide a results link to Splunk. But the sharing is not app or global and therefore the link is unusable to anyone who is not the owner...Can be fixed by the end user? Yes. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro`\ +| search ```The problem with alerts that are configured privately is that no one beyond the author can use the results link and non-admins cannot even see the alert in Splunk!\ +Therefore we find anything that emails that is not shared correctly *and* anything that uses a script as often the script will include a results link.\ +The idea here is to let the end user know so they can share it appropriately, the noop search is excluded to remove scheduled views from this list```\ +is_scheduled=1 disabled=0 eai:acl.sharing!="global" eai:acl.sharing!="app" search!="| noop" actions!="" `splunkadmins_scheduled_incorrectsharing`\ +| eval numberOfEmailed = mvcount(split('action.email.to',"@"))-1\ +| table title, eai:acl.app, author, eai:acl.sharing, actions, action.email.to, numberOfEmailed\ +| where numberOfEmailed>1 OR isnull(numberOfEmailed)\ +| sort author +disabled = 1 + +[SearchHeadLevel - Realtime Search Queries in dashboards] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 0 +counttype = number of events +cron_schedule = 11 4 * * 1 +description = Chance the alert requires action? High. Just a summary of all dashboards that use realtime searching...Search Head specific? Yes +dispatch.earliest_time = -48h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest/servicesNS/-/-/data/ui/views `splunkadmins_restmacro` | regex eai:data="<(earliest|latest)(Time)?>rt"\ +| search ```Shows realtime search usage within dashboards``` eai:data=*query* `splunkadmins_realtime_dashboard`\ +| regex eai:data="<search.*"\ +| rex field=eai:data "(?s)(?P<theSearch><search(?!String)[^>]*>[^<]*<query>.*?)<\/query>" max_match=200 \ +| mvexpand theSearch \ +| rex field=theSearch "(?s)<search(?P<searchInfo>[^>]*)>[^<]*<query>(?P<theQuery>.*)" \ +| search searchInfo!="*base*" ```Exclude queries which have a base, in general they will not have a earliest/latesttime so this gets confusing\ +It might be possible to use mvzip / mvexpand or mvindex to match the correct earliesttime/latesttime with each search query but it proved extremely difficult. So just keeping this as-is as the dashboard needs to be reviewed if it has too many realtime searches anyway```\ +| rex field=eai:data "(?s)<earliest(Time)?>(?P<earliesttime>[^<]+)" max_match=200 \ +| rex field=eai:data "(?s)<latest(Time)?>(?P<latesttime>[^<]+)" max_match=200 \ +| table title, eai:appName, searchInfo, theQuery,eai:acl.owner, eai:acl.sharing, label, earliesttime, latesttime, splunk_server +disabled = 1 + +[AllSplunkEnterpriseLevel - Transparent Huge Pages is enabled and should not be] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 14 2 * * * +description = Chance the alert requires action? High. Transparent huge pages should never be enabled on a Splunk enterprise server as per the http://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect when transparent huge pages is enabled on a Linux server, it should be disabled.\ +Redhat Linux has an issue where the transparent huge pages setting changes after Splunk starts if the server was rebooted, check /sys/kernel/mm/transparent_hugepage to confirm...\ +| rest /services/server/sysinfo is an alternative if you want the current search head + indexers, but this will ignore other search heads...```\ +index=_internal "Linux transparent hugepage support, enabled=" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkenterprisehosts` enabled!="never"\ +| eval error="This configuration of transparent hugepages is known to cause serious runtime problems with Splunk. Typical symptoms include generally reduced performance and catastrophic breakdown in system resp\ +onsiveness under high memory pressure. Please fix by setting the values for transparent huge pages to \"madvise\" or preferably \"never\" via sysctl, kernel boot parameters, or other method recommended by your Linux distribution."\ +| table _time, host, _raw, error +disabled = 1 + +[IndexerLevel - Old data appearing in Splunk indexes] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 33 7 * * 0 +description = Chance the alert requires action? Moderate. A slightly more complex alert that attempts to find recently indexed data that is been indexed with older timestamps, an attempt to find invalid date parsing for Splunk inputs +dispatch.earliest_time = -10y +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats max(_time) AS mostRecentlySeen, max(_indextime) AS mostRecentlyIndexed, min(_time) AS earliestSeen, min(_indextime) AS earliestIndexTime , count \ + where _index_earliest=`splunkadmins_olddata_lookback`, earliest=`splunkadmins_olddata_earliest`, latest=`splunkadmins_olddata_latest` \ + groupby source, sourcetype, index, host\ +| search ```Find data that appears to be logged in the past, this may indicate poor timestamp parsing (or we're just ingesting really old data``` `splunkadmins_olddata`\ +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\"), invesLatestTime=mostRecentlySeen+1, invesLatestIndexTime=mostRecentlyIndexed+1\ +| eval investigationQuery="```Narrow down to the older part of the timeline after this query runs to see the potential issue...``` index=" . index . " source=\"" . invesDataSource . "\" sourcetype=\"" . sourcetype . "\" host=" . host . " earliest=" . earliestSeen . " latest=" . invesLatestTime . " _index_earliest=" . earliestIndexTime . " _index_latest=" . invesLatestIndexTime . " | eval indextime=strftime(_indextime, \"%+\")" \ +| eval mostRecentlySeen=strftime(mostRecentlySeen, "%+"), mostRecentlyIndexed=strftime(mostRecentlyIndexed, "%+")\ +| sort index, host, sourcetype\ +| table index, source, sourcetype, host, mostRecentlySeen, mostRecentlyIndexed, count, investigationQuery +disabled = 1 + +[AllSplunkLevel - Splunk forwarders that are not talking to the deployment server] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 0 8 * * * +description = Chance the alert requires action? Moderate. All forwarders should talk to the deployment server unless they have a special reason for an exclusion... +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats count where index=_internal groupby host \ +| fields host \ +| search ```This is an attempt to find any universal forwarders that send data into the indexers but do not phone home to the expected deployment server``` `splunkadmins_forwarders_nottalking_ds`\ +| eval shortname=mvindex(split(host, "."), 0) \ +| eval talking=0 \ +| table shortname, host, talking \ +| append \ + [ search index=_internal `deploymentserverhosts` source="*splunkd_access.log" sourcetype=splunkd_access \ + | rex field=uri "/services/broker/phonehome/connection_[^_]+_[89][0-9]{3}_[^_]+(_[0-9][^_]+)?_(?P<hostname>[^_]+)_" \ + | eval host=hostname \ + | eval shortname=mvindex(split(host, "."), 0) \ + | eval talking=1 \ + | dedup shortname, host, talking \ + | table shortname, host, talking]\ +| append\ + [ search index=_internal `deploymentserverhosts` source="*splunkd_access.log" sourcetype=splunkd_access\ + | rex field=uri "/services/broker/phonehome/connection_(?P<ipaddr>[^_]+)_[89][0-9]{3}_[^_]+(_[0-9][^_]+)?_[^_]+_"\ + | rename ipaddr AS host\ + | eval shortname=host\ + | eval talking=1\ + | dedup shortname, host, talking\ + | table shortname, host, talking]\ +| reverse | dedup shortname, host \ +| search NOT (`splunkenterprisehosts`) \ +| search talking=0 \ +| fields - talking \ +| lookup dnslookup clienthost AS host \ +| search clientip!='' +disabled = 1 + +[AllSplunkEnterpriseLevel - sendmodalert errors] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +alert_condition = search savedsearch_name=* +counttype = custom +cron_schedule = */15 * * * * +description = Chance the alert requires action? Low. sendmodalert errors from Splunk might advise of a failure in an alert action, also see SearchHeadLevel - Script failures in the last day +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```sendmodalert or sendalert errors and warnings may be an issue relating to the creation of alerts via a script```\ +index=_internal `splunkenterprisehosts` ("ERROR sendmodalert" action) OR ("WARN sendmodalert" action) OR "Error in 'sendalert' command" sourcetype=splunkd (`splunkadmins_splunkd_source`)\ +```If you need more context on the above errors add this snippet into the above search:\ +OR "sendmodalert - Invoking modular alert action"\ +```\ +`splunkadmins_sendmodalert_errors`\ +| rex field=results_file "[/\\\]dispatch[/\\\](?P<sid>[^/]+)"\ +| eval sid=if(isnull(sid),"NOMATCH",sid)\ +| join sid type=outer [search index=_internal source="*scheduler.log" sourcetype=scheduler `splunkenterprisehosts` | table sid, savedsearch_name, app, user]\ +| cluster showcount=true\ +| table host, savedsearch_name, app, user, _raw, _time, cluster_count\ +| eval mostRecent = strftime(mostRecent, "%+")\ +| sort - _time +disabled = 1 + +[IndexerLevel - Indexer not accepting TCP Connections] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 3,18,33,48 * * * * +description = Chance the alert requires action? Low. The indexer is either overloaded or down and not accepting TCP connections... +dispatch.earliest_time = -15m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The indexer not accepting TCP connections is either a serious performance issue or downtime``` \ +index=_internal TcpOutputFd "connection refused" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) \ +| rex "Connect to (?P<clientip>[^:]+)" \ +| top clientip \ +| lookup dnslookup clientip \ +| where count>10 +disabled = 1 + +[IndexerLevel - Buckets rolling more frequently than expected] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 8 8 * * * +description = Chance the alert requires action? Moderate. Indexer level issues - Buckets are moving out of the warm state quicker than expected and this may (or may not) be an issue, this could indicate that hot is undersized or there are too many buckets in the warm area. In Splunk 7.2 the monitoring console has introduced Health warning - The percentage of small of buckets created (x) over the last hour is very high, this new warning will likely replace this alert +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Buckets are moving from warm to cold very quickly and this could be an issue related to the sizing not been valid for the indexes...``` \ +`indexerhosts` index=_internal "Will chill bucket" (`splunkadmins_splunkd_source`) sourcetype=splunkd "/db/db" \ +| rex "=/.*?(?P<indexname>[^/]+)(/[^/]+){2} " \ +| stats count by indexname \ +| sort - count \ +| where (count>`splunkadmins_bucketrolling_count`) +disabled = 1 + +[ForwarderLevel - Read operation timed out expecting ACK] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 47 */2 * * * +description = Chance the alert requires action? Low. Acknowledgement from the indexers should ideally never timeout, the time out may cause duplication issues +dispatch.earliest_time = -2h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This read operation timed out expecting ACK will likely result in the forwarder re-sending at least some data to the indexer. This can be caused by lack of CPU on the forwarder and potentially other issues...```\ +index=_internal "Read operation timed out expecting ACK from" sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`)\ +| rex "from (?P<indexer>\S+)"\ +| stats count, max(_time) AS mostRecent by host, indexer\ +| eval mostRecent=strftime(mostRecent, "%+")\ +| search ```Allow exclusions such as ignoring a count per host or similar...``` `splunkadmins_readop_expectingack` +disabled = 1 + +[AllSplunkEnterpriseLevel - Replication Failures] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = */15 * * * * +description = Chance the alert requires action? Moderate. Replication failures often show a search head that is having issues after an indexer restart, the search head might require a restart to resolve this or investigation. +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Replication status failure on a search head, the search head may require a restart or investigation...```\ +index=_internal "because replication was unsuccessful. replicationStatus Failed failure info:" OR "replicateDelta: failed for" OR (ERROR DistributedBundleReplicationManager) OR (WARN DistributedBundleReplicationManager NOT "however it took too long") ```These are deprecated in Splunk 9, NOT "replicationWhitelist in distsearch.conf is deprecated" NOT "replicationBlacklist in distsearch.conf is deprecated". This is confirmed as an invalid warning message in Splunk 9``` NOT "Failed to touch bundle=, checksum=0 (manual preparation): No such file or directory" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkenterprisehosts` `splunkadmins_repfailures`\ +| search ```Exclude shutdown times``` NOT [`splunkadmins_shutdown_keyword(indexerhosts,180,180)`]\ +| eval event_message=coalesce(event_message,message) \ +| stats count, max(_time) AS mostRecent by host, event_message \ +| sort - mostRecent \ +| eval mostRecent=strftime(mostRecent, "%+") \ +| where (match(event_message, "No auth token for peer") AND count>1) OR NOT (match(event_message, "No auth token for peer")) \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| fields - search_head +disabled = 1 + +[AllSplunkEnterpriseLevel - Low disk space] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = */5 * * * * +description = Chance the alert requires action? Moderate. Low disk space on one or more partitions of the Splunk enterprise servers... +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Use introspection data to monitor Splunk mount points, if you want to monitor non-Splunk directories use nmon or another monitoring system```\ +index=_introspection host=* component=Partitions `splunkadmins_lowdisk`\ +| eval available='data.available', capacity='data.capacity', mount_point='data.mount_point'\ +| eval percfree = round((available/capacity)*100,2)\ +| stats min(percfree) AS percfree, min(available) AS minMBAvailable by mount_point, host\ +| search ```Below 10% (default only, can be changed in the macro) is an issue unless it's an indexer, as 10% of the indexer is actually a very large amount of data...```\ +(percfree<`splunkadmins_lowdisk_perc` NOT (`indexerhosts`)) OR (minMBAvailable<`splunkadmins_lowdisk_mb` (`indexerhosts`)) +disabled = 1 + +[AllSplunkEnterpriseLevel - KVStore Process Terminated] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 5 +counttype = number of events +cron_schedule = */15 * * * * +description = Chance the alert requires action? High. Ideally you shouldn't see this error... +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This ideally should never happen during normal runtime...```\ +index=_internal `searchheadhosts` "KV Store process terminated" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkadmins_kvstore_terminated`\ +| fields _time, host, _raw +disabled = 1 + +[AllSplunkEnterpriseLevel - File integrity check failure] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 1 +counttype = number of events +cron_schedule = 33 9 * * * +description = Chance the alert requires action? Moderate. File integrity check failure would generally mean a change has been made to parts of Splunk that will be wiped out next upgrade +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```One or more files did not pass the startup hash-check against the Splunk provided manifest, you can tune the limits.conf to control how the warning is logged or not logged```\ +index=_internal `splunkenterprisehosts` "An installed * did not pass hash-checking due to" (`splunkadmins_splunkd_source`) sourcetype=splunkd `splunkadmins_fileintegritycheck`\ +| eval message=coalesce(message,event_message)\ +| stats count, latest(_time) AS lastSeen by message, host\ +| eval lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[AllSplunkEnterpriseLevel - WARN iniFile Configuration Issues] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 0 4 * * 1 +description = Chance the alert requires action? Low. Detect configuration errors in the files that the indexer cluster or enterprise servers are throwing warnings about +dispatch.earliest_time = -1w +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find potentially invalid configuration within the Splunk applications on search heads/indexers and warn about this...``` \ +index=_internal WARN IniFile `splunkenterprisehosts` sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkadmins_warninifile`\ +| cluster showcount=true\ +| fields _time, host, cluster_count, _raw +disabled = 1 + +[SearchHeadLevel - Long filenames may be causing issues] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 56 5 * * * +description = Chance the alert requires action? Moderate. There are one or more dashboards or alerts with a filename long enough to cause errors in the archive processor, the exact implications are unknown but the alert/dashboard may need to be removed. +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect the issue where someone has created a file longer than 100 characters and the cluster is having issues with replication. The 100 character issue was confirmed in Splunk 6.5.2 during a support case```\ +index=_internal `searchheadhosts` (ArchiveFile "Failed to write archive header for" "Pathname too long") OR ("ERROR Archiver" "Unable to add entry") (`splunkadmins_splunkd_source`) sourcetype=splunkd \ +| cluster showcount=true\ +| fields _time, _raw, cluster_count +disabled = 1 + +[IndexerLevel - Large multiline events using SHOULD_LINEMERGE setting] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 42 7 * * * +description = Chance the alert requires action? Moderate. This alert advises that a multi-line event is appearing in Splunk that is large enough that the default SHOULD_LINEMERGE = true setting may cause blocking in the indexer aggregation queue, it's much more efficient to configure the SHOULD_LINEMERGE = false and LINE_BREAKER = ... if possible, note the TRUNCATE settings will likely need to be much larger to deal with the LINE_BREAKER change.\ +Please update the props.conf for this sourcetype to LINE_BREAKER if applicable (and the TRUNCATE setting). +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats max(linecount) AS maxLineCount, min(_time) AS firstSeen, max(_time) AS mostRecent, values(host) AS hosts, count AS occurrenceCount where index=*, linecount>250 groupby sourcetype\ +| search ```This search detects sourcetypes with greater than 250 lines which have SHOULD_LINEMERGE set to true, this might cause blocking in the indexer aggregation queue if there are a large number\ +of events with hundreds of lines or very large events such as >5000 lines of data. This alert is designed to give hints about where SHOULD_LINEMERGE=false / LINE_BREAKER=... might be more appropriate\ +Note that the REST API will return every instance of sourcetype, it's not quite as accurate a btool so this can generate false alarms if there are multiple props.conf definitions of a sourcetype```\ +`splunkadmins_multiline_linemerge`\ +| join [| rest `splunkindexerhostsvalue` /servicesNS/-/-/configs/conf-props\ +| fields title SHOULD_LINEMERGE\ +| search SHOULD_LINEMERGE = 1\ +| dedup title | rename title AS sourcetype]\ +| where maxLineCount > 260 AND occurrenceCount>30\ +| eval hostList=if(mvcount(hosts)>1,mvjoin(hosts," OR host="),hosts)\ +| eval hostList="host=" . hostList\ +| eval investigationQuery="index=* sourcetype=" . sourcetype . " " . hostList . " linecount>250 earliest=" . firstSeen . " latest=" . mostRecent\ +| sort - occurrenceCount, maxLineCount\ +| table sourcetype, maxLineCount, occurrenceCount, investigationQuery +disabled = 1 + +[IndexerLevel - Data parsing error] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = */10 * * * * +description = Chance the alert requires action? High. This alert advises there is an error with the LINE_BREAKER or Aggregator, this generally relates to a misconfiguration that requires a fix... +dispatch.earliest_time = -10m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```LineBreakingProcessor ERROR's are usually related to misconfiguration/errors in the LINE_BREAKER= setup in props.conf and are therefore an issue. This version of the Aggregator error often relates to date time config files```\ +index=_internal (WARN CsvLineBreaker) OR ("ERROR" ("JsonLineBreaker" OR "LineBreakingProcessor" OR "AggregatorMiningProcessor") NOT "WARN AggregatorMiningProcessor") sourcetype=splunkd (`splunkadmins_splunkd_source`) (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_dataparsing_error` \ +| rex "(?s)^(\S+\s+){3}(?P<error>.*)"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(host) AS hosts by error\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+")\ +| table count, mostrecent, firstseen, hosts, error +disabled = 1 + +[SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 12,22,32,42,52,02 * * * * +description = Chance the alert requires action? High. If the KVStore is out of sync or the search head is out of sync it will likely require a manual resync/clean to get it working as expected\ +If it relates to a conf replication issue it is likely a problematic search head requiring a restart or it may require a force sync...(the logs will advise on this)\ +To remove false alarms this alert now checks if any shutdown messages appear, this may require tweaking in your environment as it checks for *any* search head shutdown... +dispatch.earliest_time = -10m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect search head issues related to extended search head downtime, in particular ConfReplication issues or KV store replication issues\ +KVStore - http://docs.splunk.com/Documentation/Splunk/latest/Admin/ResyncKVstore , ConfReplication - http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/HowconfrepoworksinSHC#Replication_synchronization_issues\ +The search head cluster captain is disconnected can relate to a SH cluster restart *or* if outside a rolling restart this may require a restart of the problematic search head...\ +In addition to this you could also look for "Error pushing configurations to captain" consecutiveErrors>1 , this would also hint at a potential issue although a small number of consecutive errors appears to be normal...\ +If you see the message "Consider performing a destructive configuration resync on this search head cluster member", then it's a real issue and often requires manual intervention...``` \ +index=_internal `searchheadhosts` "Local KV Store has replication issues" OR ("ConfReplicationThread" "captain") OR ("SHCMasterHTTPProxy" "Low Level http request" NOT "did not satisfy regex" NOT "does not exist" NOT "peer already has artifact" NOT "has inflight replications" NOT ("failed on report target request" "not found")) sourcetype=splunkd (`splunkadmins_splunkd_source`) \ +| regex "\S+\s+\S+\s+\S+\s+(ERROR|WARN)" \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| search ```Exclude time periods where shutdowns were occurring``` NOT [`splunkadmins_shutdown_time(searchheadhosts,0,0)`]\ +| cluster showcount=true t=0.93 labelonly=t \ +| fillnull value=0 consecutiveErrors \ +| stats min(_time) AS firstSeen, max(_time) AS mostRecent, values(_raw) AS _raw, max(cluster_count) AS cluster_count, max(consecutiveErrors) AS consecutiveErrors by host, cluster_label, search_head_cluster \ +| eval firstSeen=strftime(firstSeen, "%+"), mostRecent=strftime(mostRecent, "%+") \ +| where (match(_raw, "Error pushing configurations") AND consecutiveErrors>4) OR (match(_raw, "Error pulling configurations") AND consecutiveErrors>2) OR NOT match(_raw, "Error (pushing|pulling) configurations") \ +| fields - cluster_label, consecutiveErrors +disabled = 1 + +[SearchHeadLevel - SHCluster Artifact Replication Issues] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 54 * * * * +description = In this scenario either something has changed or one or more search heads are not syncing the artifacts as expected, a restart of the SH cluster usually resolves this. +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```When this issue occurs it is likely related to some kind of issue post-restart of the indexer/search head cluster. Restarting the search head cluster appears to resolve the issue in 6.5.2```\ +index=_internal "ERROR SHCArtifactId" "This GUID does not match the member's current GUID" sourcetype=splunkd (`splunkadmins_splunkd_source`) `searchheadhosts` \ +| eventstats max(_time) AS lasterror, min(_time) AS firsterror\ +| cluster showcount=true \ +| table host, cluster_count, _raw, lasterror, firsterror\ +| eval lasterror = strftime(lasterror, "%+"), firsterror = strftime(firsterror, "%+") +disabled = 1 + +[LicenseMaster - Duplicated License Situation] +alert.suppress = 0 +alert.track = 1 +alert.severity = 5 +counttype = number of events +cron_schedule = 27 * * * * +description = Chance the alert requires action? High. A duplicated licensing situation will normally require intervention to fix, this scenario can happen when a forwarder is not using the forwarder license or license master but is sending data into the indexers...\ +Another example can occur when the forwarder cannot connect to the remote license server, example log:\ +02-23-2024 02:24:03.876 +0000 WARN DistributedPeerManager [76946 DistributedPeerMonitorThread] - Duplicated License situation happen on peer=1531FB3B-6C49-46F8-BDA3-36B20ED7CD13 (losttext) because it has the same license key=B8632644E0D448E1BB9835BC6055FEEE33476A6A52AB5DEFBFDB3AB55A1983A7 as this searchhead=examplesearchhead but searchhead's license master=05FCEBDC-5184-3457-BECF-C0B4AFB538CA != peer's license_master=1531FB3B-6C49-46F8-BDA3-36B20ED7CD13 please fix this issue in 72 hours, otherwise search will be disabled +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype","splunk_server"] +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This warning will appear in the messages console of the license master, considering you have 72 hours to fix the issue you might want to be alerted about this so it can be fixed promptly! The scenario is likely to occur when a heavy forwarder is sending data into the indexers without using the forwarder license or talking to the cluster master.```\ +index=_internal `licensemasterhost` "Duplicated License situation happen" (`splunkadmins_splunkd_source`) \ +| cluster showcount=true \ +| fields _time, host, source, _raw, cluster_count +disabled = 1 + +[DeploymentServer - Unsupported attribute within DS config] +alert.suppress = 0 +alert.track = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 48 */4 * * * +description = Chance the alert requires action? High. A syntax error from manually editing the serverclass.conf will normally need to be fixed. Note that this alert is only useful if you are manually editing your serverclass.conf, if you only use the GUI then it is unlikely that this alert will ever be triggered. +dispatch.earliest_time = -4h@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A DS_DC_Common warning normally relates to a typo within the serverclass.conf, most likely due to manual editing, this will prevent the GUI from been used for the forwarder management configuration...```\ +index=_internal "WARN" "DS_DC_Common" `deploymentserverhosts` sourcetype=splunkd (`splunkadmins_splunkd_source`)\ +| eval message=coalesce(message,event_message)\ +| stats first(_time) AS lastSeen by message, host\ +| eval lastSeen=strftime(lastSeen, "%+")\ +| table lastSeen, message, host +disabled = 1 + +[AllSplunkEnterpriseLevel - TCP or SSL Config Issue] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 53 * * * * +description = Chance the alert requires action? High. Since the TCP listener port may not be working as expected you likely wish to run the appropriate checks on the forwarder/indexer +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A TcpInputConfig or SSLCommon error likely indicates a misconfiguration of a heavy forwarder or indexer, this may prevent the listening port from working as expected```\ +index=_internal ERROR "TcpInputConfig" OR "SSLCommon" OR "Could not bind" sourcetype=splunkd (`splunkadmins_splunkd_source`) (`indexerhosts`) OR (`heavyforwarderhosts`) \ +| eval message=coalesce(message,event_message)\ +| stats first(_time) AS mostRecent by host, source, sourcetype, message\ +| table host, message, mostRecent\ +| eval mostRecent=strftime(mostRecent, "%+") +disabled = 1 + +[IndexerLevel - Peer will not return results due to outdated generation] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = */15 * * * * +description = Chance the alert requires action? High. In general this error should not appear for a long period of time, so if it does there is likely an issue. Note this can be replaced by "AllSplunkEnterpriseLevel - Losing Contact With Master Node" +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If the error Peer ... will not return any results for this search, because the search head is using an outdated generation then either the peer requires a restart or there is another issue here. Assuming the issue does not resolve itself quickly...```\ +index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` `indexerhosts` "because the search head is using an outdated generation"\ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen, values(host) AS host by message\ +| eval diff=lastSeen-firstSeen\ +| where diff>60\ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") \ +| table host, message, firstSeen, lastSeen, count +disabled = 1 + +[SearchHeadLevel - Scheduled searches failing in cluster with 404 error] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 23 * * * * +description = Chance the alert requires action? High. If 404's are occurring within the search head cluster then it is possible there is a member out of sync... +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A 404 error appearing on some search head peers but not others might imply a synchronisation issue within the search head cluster has occurred, this might require correction potentially through a re-sync http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/HowconfrepoworksinSHC#Replication_synchronization_issues ```\ +index=_internal `searchheadhosts` "find saved search with name" sourcetype=splunkd `splunkadmins_splunkd_source`\ +| rex field=err "/servicesNS/(?P<username>[^/]+)/(?P<appName>[^/]+)"\ +| rex "'(?P<searchname>[^']+)'.$"\ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS lastSeen, values(username) AS username, values(appName) AS appName, values(searchname) AS searchName by message, peer\ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+")\ +| table username, appName, searchName, firstSeen, lastSeen, count, peer, message +disabled = 1 + +[IndexerLevel - Too many events with the same timestamp] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 44 4 * * * +description = Find excessive numbers of events with the same timestamp so they can be reviewed to see if the data is valid or not +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Too many events with the same timestamp have been found. This may be a sign of poor quality data, or a problematic log file```\ +index=_internal "Too many events" (`indexerhosts`) OR (`heavyforwarderhosts`) `splunkadmins_splunkd_source` `splunkadmins_toomany_sametimestamp`\ +| cluster showcount=true \ +| rex "Too many events \((?P<number>[0-9]+.)"\ +| rename data_host AS host, data_sourcetype AS sourcetype, data_source AS source\ +| eval invesDataSource = replace(source, "\\\\", "\\\\\\\\"), invesStartTime=floor(_time)\ +| eval investigationQuery="```You will need to set the time settings manually as the log does not provide the parsed time, only the indexed time the issue occurred at...``` index=* host=" . host . " sourcetype=\"" . sourcetype . "\" source=\"" . invesDataSource . "\" _index_earliest=" . invesStartTime\ +| eval message=coalesce(message,event_message)\ +| table host, sourcetype, source, number, cluster_count, message, _time, investigationQuery +disabled = 1 + +#Updated as per issue #3, tstats may return the "host" field if host:: is in the raw data where the tstats searches +#the workaround provided by support is to make : and :: major segmenters, or do add a where clause (search clause used) +[SearchHeadLevel - Detect MongoDB errors] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 53 * * * * +description = Chance the alert requires action? High. If there are errors in the mongo log files on a search head cluster (unrelated to restarts) then this might indicate a kvstore issue which needs attention +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The main goal of this alert errors which might not appear in splunkd.log but are critical to keeping the kvstore running on the search heads. Please check the mongod.log file for further information, the additional count field is simply determining that mongo is still logging...\ +Attempt to find errors in the mongod log and make sure the errors do not relate to shutdown events in the search head cluster. Since this does will ignore any events when either cluster shutsdown it might not be sensitive enough for some use cases...```\ +index=_internal `searchheadhosts` `splunkadmins_mongo_source` (" E " OR " F " OR " W ") ```https://jira.mongodb.org/browse/SERVER-42078 advises this is harmless``` NOT "update of non-mod failed" `splunkadmins_mongodb_errors`\ +| regex _raw="^\s+?\S+\s+[EF]" \ +| search ```Exclude time periods where shutdowns were occurring``` NOT [`splunkadmins_shutdown_time(searchheadhosts,60,60)`]\ +| eventstats max(_time) AS mostRecent, min(_time) AS firstSeen by host\ +| bin _time span=10m \ +| stats values(_raw) AS logMessages, max(mostRecent) AS mostRecent, min(firstSeen) AS firstSeen by _time, host \ + ```One final symptom that appears when mongodb is dead is the logging just stops, zero data, however this proved to be tricky in Splunk so the below query uses a few tricks to ensure the data will show zero values even if the server stops reporting. timechart was recommended by splunkanswers as it creates a timebucket with null values if no data is found...```\ +| append \ + [ | tstats prestats=t count where index=_internal `searchheadhosts` `splunkadmins_mongo_source` by host, _time span=5m \ + | search `searchheadhosts`\ + | timechart limit=0 partial=f span=5m count by host \ + | fillnull \ + | untable _time, host, count \ + | stats max(_time) AS mostRecent, min(_time) AS firstSeen, last(count) AS lastCount by host \ + | where lastCount=0 \ + | eval logMessages="Zero log entries found at this time, mongod might not be running, please investigate" \ + | fields - lastCount \ + | eval _time=now() ] \ +| eval mostRecent = strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+")\ +| fields _time, host, firstSeen, mostRecent, logMessages\ +| search ```Just in case...``` `splunkadmins_mongodb_errors2` +disabled = 1 + +[IndexerLevel - Cold data location approaching size limits] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 13 7 * * * +description = Chance the alert requires action? High. One or more indexes are approaching the disk limits on their cold data, therefore the buckets will roll to frozen once this limit is reached... +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkindexerhostsvalue` /services/data/indexes/ datatype=all \ + ```This search attempts to find indexes which are about to start rolling buckets to frozen due to disk space issues by checking how much percentage of the allocated cold section of disk is used. It does not take into account any volume sizing...\ + This is the more proactive form of IndexerLevel - Buckets are been frozen due to index sizing``` \ +| join title splunk_server type=outer \ + [| rest `splunkindexerhostsvalue` /services/data/indexes-extended/ datatype=all] \ +| stats values(bucket_dirs.cold.bucket_size) AS currentColdSizeMB, values(bucket_dirs.home.warm_bucket_size) AS currentWarmSizeMB, max(bucket_dirs.home.event_max_time) AS latestTime, min(bucket_dirs.cold.event_min_time) AS earliestTime, min(bucket_dirs.home.event_min_time) AS earliestTimeHot, values(coldPath.maxDataSizeMB) AS coldPathSizeLimitMB, values(currentDBSizeMB) AS hotSizeMB, values(maxTotalDataSizeMB) AS maxTotalDataSizeMB, values(frozenTimePeriodInSecs) AS frozenTimePeriodInSecs, values(maxDataSize) AS maxDataSize, values(homePath.maxDataSizeMB) AS hotPathMaxDataSizeMB, values(bucket_dirs.home.warm_bucket_count) AS currentWarmCount, values(maxWarmDBCount) AS maxWarmDBCount by name, splunk_server \ +| eval currentColdSizeMB=coalesce(currentColdSizeMB,currentWarmSizeMB), earliestTime=coalesce(earliestTime,earliestTimeHot)\ +| eval "Days of data based on epoch values"=round((latestTime-earliestTime)/3600/24) \ +| rename name AS index, splunk_server AS indexer\ + ```Things do get a little bit messy here, if the cold path size is unlimited, the remaining data is the maxTotalSizeMB minus what we have already used in the hot section (not a perfect calculation but close enough for our purposes``` \ +| eval warm_bucket_percent = (100 / maxWarmDBCount) * currentWarmCount \ +| eval coldPathSizeLimitMB=case(warm_bucket_percent>95 AND coldPathSizeLimitMB==0,maxTotalDataSizeMB-currentWarmSizeMB,coldPathSizeLimitMB==0 AND hotPathMaxDataSizeMB==0,maxTotalDataSizeMB,coldPathSizeLimitMB==0 AND hotPathMaxDataSizeMB!=0,maxTotalDataSizeMB-hotPathMaxDataSizeMB,1=1,coldPathSizeLimitMB) \ +| eval percUsed=round((currentColdSizeMB/coldPathSizeLimitMB)*100,2) \ +| eval frozenTimeInDays=frozenTimePeriodInSecs/60/60/24 \ +| eval maxDataSize=case(maxDataSize="auto","750",maxDataSize="auto_high_volume","10240",true(),maxDataSize)\ +| eval worstCaseBucketCountLeft=floor((coldPathSizeLimitMB-currentColdSizeMB)/maxDataSize)\ +| where percUsed>`splunkadmins_colddata_percused` \ +| search `splunkadmins_colddata`\ +| table index, indexer, currentColdSizeMB, coldPathSizeLimitMB, percUsed, frozenTimeInDays, "Days of data based on epoch values", worstCaseBucketCountLeft +disabled = 1 + +[AllSplunkEnterpriseLevel - Unable to dispatch searches due to disk space] +alert.severity = 4 +alert.suppress = 1 +alert.suppress.period = 4h +alert.track = 1 +counttype = number of events +cron_schedule = 32 * * * * +description = Chance the alert requires action? High. Unless the disk space issue clears itself some action will be required either now or to prevent future failures. +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect when the scheduler is unable to run search commands due to a lack of disk space on the filesystem```\ +index=_internal `splunkenterprisehosts` (sourcetype=splunkd `splunkadmins_splunkd_source` "Search not executed: Dispatch Command: The minimum free disk space") OR (sourcetype=scheduler "The minimum free disk space * reached")\ +| eval message=coalesce(message,event_message)\ +| stats count, min(_time) AS firstSeen, max(_time) AS mostRecent, max(_raw) AS lastExample by host, message\ +| eval firstSeen=strftime(firstSeen, "%+"), mostRecent=strftime(mostRecent, "%+")\ +| table host, count, firstSeen, mostRecent, lastExample +disabled = 1 + +[IndexerLevel - Unclean Shutdown - Fsck] +alert.suppress = 0 +alert.track = 1 +alert_condition = search host=* +counttype = custom +cron_schedule = 4,19,34,49 * * * * +description = Chance the alert requires action? Moderate. One or more indexes are mentioned as corrupt in the log files, this should auto-repair but it may cause errors in the search interface until the repair is complete, you may also wish to try IndexerLevel - Corrupt buckets via DBInspect +dispatch.earliest_time = -15m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to detect if an indexer crash resulted in corrupt buckets, if so alert the admin so they are aware...\ +The indexer is likely going to print the line "WARN IndexerService - Indexer was started dirty: splunkd startup may take longer than usual; searches may not be accurate until background fsck completes.", however we also want to know if buckets were corrupted. In a clustered environment the corrupt buckets should be added to the cluster master fixup list and repaired online, if a non-clustered environment refer to the Splunk fsck documentation. Note that I'm unable to find a log message to advise when the OnlineFsck completes in splunkd.log. You may also wish to refer to alert IndexerLevel - Corrupt buckets via DBInspect\ +FYI fixup lines in the splunkd log file may look like "06-12-2018 07:31:47.160 +0000 INFO ProcessTracker - (child_407__Fsck) Fsck - (entire bucket) Rebuild for bucket='/opt/splunk/var/lib/splunk/indexname/db/db_1528466340_1520517600_38_A25ECA32-B33E-4469-8C76-22190FDCC8CB' took 86.26 seconds.```\ +index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` `indexerhosts` "At restart after an unclean shutdown found bucket path" OR ("finished moving hot to warm" caller=init_roll) OR (OnlineFsck "Scheduled repair fsck* kind='entire bucket'")\ +| rex field=path "(?P<pathWithoutHot>.*)(/|\\\\)\S+"\ +| join type=outer pathWithoutHot \ + [| rest /services/data/indexes `splunkindexerhostsvalue` datatype=all \ + | fields homePath_expanded, title \ + | rename homePath_expanded AS pathWithoutHot, title AS idxFromREST]\ +| eventstats max(_time) AS mostRecent by idx, host\ +| bin _time span=5m\ +| eval idx=coalesce(idxFromREST, idx)\ +| stats count(eval(searchmatch("unclean shutdown"))) AS uncleanCount, count(eval(searchmatch("Scheduled repair fsck"))) AS scheduledRepairCount, count(eval(searchmatch("finished moving hot to warm"))) AS hotToWarmCount, max(mostRecent) AS mostRecent by idx, host, _time\ +| where uncleanCount>0\ +| append\ + [makeresults 1\ + | eval idx="#The message \"At restart after an unclean shutdown found bucket path...\" results in buckets being rolled/repaired. Users may see errors when running searches, such as \"Failed to read size=2 event(s) from rawdata in bucket=...Rawdata may be corrupt, see search.log/splunk_search_messages sourcetype. Results may be incomplete!\" (OR) \"idx=_internal Could not read event: cd=(n/a). Results may be incomplete ! (logging only the first such error; enable DEBUG to see the rest).\" This appears to resolve itself in Splunk 7+ when the fsck's complete. I have not determined how to find a completion time..."]\ +| fields - _time, uncleanCount\ +| sort idx\ +| eval mostRecent=strftime(mostRecent, "%+")\ +| addcoltotals labelfield=host label="Total Count" +disabled = 1 + +[AllSplunkEnterpriseLevel - Detect LDAP groups that no longer exist] +alert.severity = 2 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 57 6 * * * +description = Chance the alert requires action? High. An LDAP group is configured in Splunk that does not exist in LDAP, this is a minor issue but it can be fixed by removing it from authentication.conf +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find any LDAP groups that are reporting that they do not exist so can therefore be removed from the Splunk configuration\ +This appears to occur only after restarts of the Splunk server, however it is useful to know about as the authentication.conf can be cleaned up once found```\ +index=_internal sourcetype=splunkd `splunkenterprisehosts` `splunkadmins_splunkd_source` "was not found on the LDAP server"\ +| eval message=coalesce(message,event_message)\ +| stats max(_time) AS mostRecentlySeen, min(_time) AS firstSeen by message, host\ +| eval mostRecentlySeen=strftime(mostRecentlySeen, "%+"), firstSeen=strftime(firstSeen, "%+")\ +| fields host, mostRecentlySeen, firstSeen, message +disabled = 1 + +[ClusterMasterLevel - Per index status] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = */5 * * * * +description = Chance the alert requires action? Moderate. Note this alert is a candidate for removal as of 2023-11-23, the goal when I wrote it was to determine on a per-index basis *which* index or indexes were missing a "searchable" bucket copy, however it's rarely a case where you need to alert on this per-index. Check if the is_searchable flag is set to false *or* detect when an index is not matching the search factor of at least 1 copy between the sites. Changed to 5 min intervals to pass certification you may want to run this more regularly. Cluster master specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/cluster/master/indexes `splunkadmins_clustermaster_host`\ +| foreach searchable_copies_tracker.*.actual_copies_per_slot \ + [ eval expectedMatchesActual_<<MATCHSTR>>=if('searchable_copies_tracker.<<MATCHSTR>>.expected_total_per_slot'=='searchable_copies_tracker.<<MATCHSTR>>.actual_copies_per_slot',"true","false") ]\ +| fields is_searchable, expectedMatchesActual_*, num_buckets, searchable*, title\ +| eval failureCount=0\ +| foreach expectedMatchesActual_*\ + [ eval failureCount=if('expectedMatchesActual_<<MATCHSTR>>'=="false",failureCount+1,failureCount) ]\ +| where failureCount > `splunkadmins_clustermaster_failurecount` OR is_searchable=0 +disabled = 1 + +[ClusterMasterLevel - Primary bucket count per peer] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Graph the number of primary buckets for site0 on each peer. Note in an environment with a large number of peers/buckets this query will be very expensive (memory wise) and should be used with caution +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.timeRangePicker.show = 0 +display.general.type = visualizations +display.page.search.tab = visualizations +display.statistics.show = 0 +display.visualizations.trellis.splitBy = PeerName +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/cluster/master/buckets `splunkadmins_clustermaster_host` \ +| search ```Note in larger environments this can cause an issue due to the number of peers returning data, so use with caution. The idea for this query comes from Splunk support & https://answers.splunk.com/answers/234717/how-to-get-list-of-buckets-which-are-having-issues.html , attempt to determine the count of primary buckets per peer for site0. This report is designed to provide 1 example of a useful REST endpoint``` standalone=0 frozen=0\ +| rename primaries_by_site.site0 AS peerGUID\ +| join type=outer peerGUID [ rest /services/cluster/master/peers `splunkadmins_clustermaster_host` \ +| fields active_* host* label title status site\ +| eval PeerName= site + ":" + label + ":" + host_port_pair\ +| rename title AS peerGUID\ +| rename site AS peerSite\ +| table peerGUID PeerName peerSite]\ +| stats count by PeerName\ +| chart sum(count) AS count by PeerName + +#An overly complex alert due to the difficulty in expanding macros within Splunk programatically +[SearchHeadLevel - Scheduled searches not specifying an index macro version] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 37 6 * * 1-5 +description = Chance the alert requires action? High. These searches are either using index=* or not specifying an index at all and relying on the default set of indexes. Can be fixed by the end user? Yes. Search Head specific? Yes. Please ensure the SearchHeadLevel - Macro report is also enabled for this to work as expected +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches\ +| search ```Look over all scheduled searches and find those not specifying/narrowing down to an index, or using the index=* trick. This version is dealing with those using macros. Please ensure the SearchHeadLevel - Macro report is also enabled for this to work as expected. Attempted to use https://answers.splunk.com/answers/186698/how-can-i-expand-a-macro-definition-in-the-search.html but the intentionsparser does not work via a REST call within Splunk, only by an external call in Splunk 7...``` `splunkadmins_scheduledsearches_without_index_macro`\ +| table title , description, eai:acl.app, eai:acl.owner, qualifiedSearch, next_scheduled_time \ +| search next_scheduled_time!="" \ +| regex qualifiedSearch!=".*index\s*(!?)=\s*([^*]|\*\S+)" \ +| regex qualifiedSearch="^\s*search " \ +| rex field=qualifiedSearch "(?s)^(?P<exampleQueryToDetermineIndexes>[^\|]+)"\ +| regex exampleQueryToDetermineIndexes="`" \ +| rename eai:acl.owner AS owner, eai:acl.app AS Application \ +| fields title, owner, description, Application, qualifiedSearch, next_scheduled_time \ +| eval search=qualifiedSearch \ +| `splunkadmins_macro_sub("search")` \ +| `splunkadmins_macro_sub("search")` \ +| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"\ +| stats values(search) AS search, first(exampleQueryToDetermineIndexes) AS exampleQueryToDetermineIndexes by title, owner, description, Application, next_scheduled_time \ +| nomv search \ +| regex search!=".*index\s*(!?)=\s*([^*]|\*\S+)" +disabled = 1 + +#Enable scheduling on this report if you use SearchHeadLevel - Scheduled searches not specifying an index macro version or SearchHeadLevel - User - Dashboards searching all indexes macro version or the other reports that rely on this +[SearchHeadLevel - Macro report] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 50 5 * * * +description = Report only? Yes. This report is required to support SearchHeadLevel - Scheduled searches not specifying an index macro version AND SearchHeadLevel - User - Dashboards searching all indexes macro version. Search Head specific? Yes +dispatch.earliest_time = @d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = | rest "/servicesNS/-/-/configs/conf-macros?count=-1" `splunkadmins_restmacro` \ +| search eai:acl.sharing!="user"\ +| rename eai:acl.sharing AS sharing\ + ```Potentially having an additional field for server group would add another level of accuracy to the lookup however in this env the chance of macros with the seame name but different definitions is low enough that this might be a waste of time...```\ +| rename eai:acl.app AS app\ +| fields title, app, definition, sharing\ +| eval splunk_server=`splunkadmins_splunk_server_name` \ +```At this point you can add in remote search heads for macros by using the macro | append [ `AuditLogsMacroReport_Helper("<your remote host>", "remote_user", "remote_password")` | splunk_server=... ]\ +The lookup is going to use the first value it sees, so just dedup on app/title```\ +| eval app=if(sharing=="global","global",'app')\ +| stats first(definition) AS definition, values(sharing) AS sharing by title, app, splunk_server\ +| outputlookup splunkadmins_macros + +[AllSplunkEnterpriseLevel - Non-existent roles are assigned to users] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 57 6 * * 1-5 +description = Chance the alert requires action? High. This particular alert is harmless but can cause some very strange results if not resolved, the fix is documented within the alert. Note this does not seem to work with SAML users. +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find where there are deleted roles assigned to users, this should only happen when the user was created with the Splunk authentication system. The fix is to open the user in the settings menu and find any user with the mentioned role, and then to save the user with no changes, this will wipe the non-existent roles from the user```\ +index=_internal sourcetype=splunkd `splunkenterprisehosts` `splunkadmins_splunkd_source` AuthorizationManager "Unknown role"\ +| eval message=coalesce(message,event_message)\ +| stats max(_time) AS lastSeen, first(_raw) AS rawMessage by message\ +| eval actionToTake="Find any users in the settings menu with the mentioned role and save them without changes to remove the role. Does not seem to work with SAML users."\ +| eval lastSeen = strftime(lastSeen, "%+")\ +| rex field=message "'(?<role>[^']+)'"\ +| eventstats values(role) as unknown_roles\ +| nomv unknown_roles\ +| eval commandForFindingUsers = "| rest `searchheadsplunkservers` /services/authentication/users f=roles f=title | search roles IN (".unknown_roles.") | stats values(roles) as roles by title"\ +| table lastSeen, message, rawMessage, actionToTake, commandForFindingUsers +disabled = 1 + +[IndexerLevel - Index not defined] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 33 * * * * +description = Chance the alert requires action? High. Either the remote forwarder is sending to the wrong index name or the index has not been defined, either way the data will be rejected by the indexer +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect if data is been sent to the indexers to an index which is not yet configured```\ +index=_internal "Received event for unconfigured" sourcetype=splunkd `splunkadmins_splunkd_source` IndexerService "Received event for unconfigured" `indexerhosts`\ +| rex "index=(?P<index>[^ ]+).*source=\"source::(?P<source>[^\"]+)\" host=\"host::(?P<host>[^\"]+)"\ +| eval message=coalesce(message,event_message)\ +| stats min(_time) AS firstSeen, max(_time) AS lastSeen, values(source) AS sourceList, values(host) AS hostsSendingToThisIndex, first(_raw) AS message by index\ +| eval sources=mvjoin(sources, ", "), firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[SearchHeadLevel - Saved Searches with privileged owners and excessive write perms] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 56 5 * * * +description = This is a rudimentary way of detecting scheduled searches or reports that could be used by a non-privileged user to run the alert/report as a privileged user through search scheduling functionality. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro` \ +| fields title, eai:acl.sharing, eai:acl.perms.read, eai:acl.perms.write, description, disabled, eai:acl.owner, dispatchAs, eai:acl.app \ +| search ```Find alerts that the owner is set to a user (not nobody), and the sharing is non-private, and finally the owner has an admin or power role``` eai:acl.owner!="nobody" eai:acl.sharing!="user" dispatchAs=owner \ + [| rest /services/authentication/users `searchheadsplunkservers` \ + | search roles=admin OR roles=power \ + | fields title \ + | rename title AS eai:acl.owner] \ +| eval writeCount=mvcount('eai:acl.perms.write') \ +| eval writePerms=mvjoin('eai:acl.perms.write', ",") \ +| search ```Exclude by macro``` `splunkadmins_privilegedowners` \ +| search ```If only the admin or power role can write to the alert then it's no problem...``` NOT (writeCount=1 (eai:acl.perms.write="admin" OR eai:acl.perms.write="power")) \ +| search ```power users have admin-like abilities, these users have a similar level of read access so less of a security concern...``` writePerms!="admin,power" \ + ```If the alert is coming from an application that only admins can see to I'm not concerned as the user should not be able to access the app to edit the search...(in theory). We also ignore if there are no read permissions at all...``` \ + NOT ( \ + [| rest /services/apps/local `splunkadmins_restmacro` \ + | fields title, visible, eai:acl.perms.read \ + | search ```If we cannot access the application then I'm assuming making it visible does not matter...``` visible=1 \ + | eval readCount=mvcount('eai:acl.perms.read') \ + | search ```If the application can only be written to by admin or power users, then we can safely ignore the alerts within it...``` (readCount=1 (eai:acl.perms.read="admin" OR eai:acl.perms.read="power") ) \ + | fields title \ + | rename title AS eai:acl.app]) \ +| where isnotnull('eai:acl.perms.write') \ +| rename title AS "Alert Name", eai:acl.perms.read AS "read perms", eai:acl.perms.write AS "write perms", eai:acl.app AS app, eai:acl.owner AS owner \ +| sort app, "Alert Name" \ +| table app, "Alert Name", description, disabled, "read perms", "write perms", owner +disabled = 1 + +[IndexerLevel - Search Failures] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 48 6 * * * +description = Chance the alert requires action? Moderate. One or more search jobs are failing to run for some reason, this may require investigation. The only issue so far has been around search factory/unknown search command but this search is generic just in case a new issue appears. Note that if you are running a modern Splunk version you may wish to use "SearchHeadLevel - Search Messages admins only" and "SearchHeadLevel - Search Messages user level" instead as they will detect issues at SH level +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to detect search failures of interest, such as Search Factory: Unknown search command 'base64' so they can be fixed before it becomes an issue for multiple users. The search is generic to attempt to detect any new errors. Note that if you are running a modern Splunk version you may wish to use \"SearchHeadLevel - Search Messages admins only\" and \"SearchHeadLevel - Search Messages user level\" instead as they detect the issues at SH level```\ +index=_internal `indexerhosts` sourcetype=splunkd_remote_searches "ERROR" OR "WARN" NOT "remote_metrics.json does not exist before reading" NOT ", Broken pipe" NOT ", Connection closed by peer" NOT ", Connection reset by peer" NOT "is already running" NOT "Local side shutting down" NOT ("ERROR StreamedSearch" "Success") \ +| regex "\+0000 (ERROR|WARN)" \ +| search ```Allow exclusions via macro...``` `splunkadmins_searchfailures`\ +| cluster field=sid showcount=true \ +| table host, cluster_count, _raw \ +| eval indexer_cluster=`indexer_cluster_name(host)` +disabled = 1 + +[SearchHeadLevel - User - Dashboards searching all indexes macro version] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 32 6 * * 1-5 +description = Chance the alert requires action? High. All dashboard panels that are using a macro and do not have an index= setting or use index=* are highlighted by this alert. Can be fixed by the end user? Yes. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/ui/views `splunkadmins_restmacro`\ + ```A dashboard searching all indexes is an issue just like a scheduled search querying all indexes or using the index=* trick. This version is dealing with those using macros. Please ensure the SearchHeadLevel - Macro report is also enabled for this to work as expected.``` \ + eai:data=*query* NOT (eai:appName=simple_xml_examples eai:acl.sharing=app) NOT (eai:appName=nmon eai:acl.sharing=app) NOT (eai:appName=splunk_app_aws eai:acl.sharing=app) \ +| regex eai:data="<search.*" \ +| mvexpand theSearch \ +| rex field=eai:data "(?s)(?P<theSearch><search(?!String)[^>]*>[^<]*<query>.*?)<\/query>" max_match=200 \ +| mvexpand theSearch \ +| rex field=theSearch "(?s)<search(?P<searchInfo>[^>]*)>[^<]*<query>(?P<theQuery>.*)" \ +| search ```If we are seeing post process search then we don't want to check if it has index= because that is likely only in the base query. These are also various exclusions for legitimate searches that will not involve scanning all indexes, such as rest or a savedsearch or similar``` searchInfo!="*base*" \ +| rename eai:appName AS application, eai:acl.sharing AS sharing, eai:acl.owner AS identity, label AS name \ +| table theQuery, application, identity, sharing, name, splunk_server, title \ +| regex theQuery!="index\s*=(?!\s*\*)" \ +| regex theQuery!="^(\()?\s*(\`|\$[^|]+\$|eventtype=|rest |<!\[CDATA\[\s*\|\s*((acl)?inputlookup|rest) |\|)" \ +| rex field=theQuery "(?s)^(?P<exampleQueryToDetermineIndexes>[^\|]+)" \ +| regex exampleQueryToDetermineIndexes="\`"\ +| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)" \ +| eval search=theQuery\ +| `splunkadmins_macro_sub("search")` \ +| `splunkadmins_macro_sub("search")` \ +| stats values(search) AS search, first(exampleQueryToDetermineIndexes) AS exampleQueryToDetermineIndexes by theQuery, application, identity, sharing, name, title, splunk_server\ +| table search, application, identity, sharing, name, splunk_server, title, exampleQueryToDetermineIndexes\ +| nomv search\ +| regex search!="index\s*=(?!\s*\*)" +disabled = 1 + +[SearchHeadLevel - Captain Switchover Occurring] +alert.suppress = 1 +alert.suppress.period = 120m +alert.track = 1 +counttype = number of events +cron_schedule = 37 * * * * +description = Chance the alert requires action? Moderate. If the captain has been changed then scheduled searches and alerts will be paused during the switchover, if this is not part of a restart then something is likely wrong...ironically this alert will not run if the issue is actually occurring so that's why the time range window is thee times the runtime of the alert in case it is missed once or twice... +dispatch.earliest_time = -180m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "SHCRaftConsensus" NOT "failed appendEntriesRequest err" NOT (SHCRaftConsensus "NOT_LEADER") `splunkadmins_captain_switchover`\ +| search ```Exclude the search head shutdown times``` NOT [`splunkadmins_shutdown_time(searchheadhosts,20,120)`] ```Exclude manual transfer``` NOT [`splunkadmins_transfer_captain_times(searchheadhosts,20,120)`]\ +| cluster showcount=true\ +| fields _time, cluster_count, _raw\ +| sort - _time +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunk Servers with resource starvation] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 13 */2 * * * +description = Chance the alert requires action? Moderate. Detect when a Splunk enterprise host is reporting that it is seeing excessive response times while running operations +dispatch.earliest_time = -120m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find entries in the splunkd logs that indiciate that Splunk is resource constrained and requires more CPU or similar```\ +index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "Might indicate hardware or splunk limitations" OR "took longer than" ```This is useful for reporting but not so useful for alerting... OR "WARN PeriodicReapingTimeout"``` NOT "Might indicate slow ldap server." ```Add in OR (WARN ConfMetrics) ?)``` \ +| rex "^[\d-]+ [\d:\.]+( )+[\+-]?\d+( )+[^ ]+( )+(?P<componentAndArea>([^ ]+( )+){3}).*\((?P<number>\d+) milliseconds" \ +| rex "^[\d-]+ [\d:\.]+( )+[\+-]?\d+( )+[^ ]+( )+(?P<componentAndArea2>DispatchManager\s+([^ ]+( )+){3}).*elapsed_ms=(?P<number3>\d+)" \ +| rex "Spent (?P<number2>\d+)"\ +| rex "reaping (?P<area>([^ ]+ ){2})"\ +| eval componentAndArea=case(isnotnull(componentAndArea2),componentAndArea2,isnull(componentAndArea),component . "_" . area,1=1,componentAndArea), number=coalesce(number,number2,number3)\ +| stats count, avg(number) AS avgTimeInSeconds, max(number) AS maxTimeInSeconds, max(_time) AS mostRecent, min(_time) AS firstSeen by componentAndArea, host\ +| search ```Allow custom exclusions``` `splunkadmins_resource_starvation`\ +| sort - mostRecent\ +| eval firstSeen=strftime(firstSeen, "%+"), mostRecent=strftime(mostRecent, "%+"), avgTimeInSeconds=round(avgTimeInSeconds/1000,2), maxTimeInSeconds=round(maxTimeInSeconds/1000,2) +disabled = 1 + +[IndexerLevel - S2SFileReceiver Error] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 36 */3 * * * +description = Chance the alert requires action? Moderate. One or more indexing peers are having issues with receiving file replications and may require investigation +dispatch.earliest_time = -3h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```An attempt to detect excessive numbers of S2SFileReceiver / TcpInputProc failures on the indexing tier, these may indicate an issue. We are only looking for errors about replication data\ +An indexer peer that was constantly logging "S2SFileReceiver...type=data_model already exists!" / "S2SFileReceiver...type=report_acceleration already exists!" has been fixed by restart (once so far)```\ +index=_internal sourcetype=splunkd `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source` ("ERROR TcpInputProc" "event=replicationData") OR ("ERROR S2SFileReceiver" "error alerting slave about") OR ("ERROR S2SFileReceiver" "error adding new summary replica to slave")\ +| rex "(?P<error>ERROR [^-]+- [^=]+=)(?P<postEquals>[^ ]+) (?P<postEquals2>[^=]+=[^ ]+)"\ +| rex field=err "(?P<type>type=.*)"\ +| eval error=if(postEquals=="onFileAborted" OR postEquals=="replicationData",error . " " . postEquals . " " . postEquals2,error . " " . type)\ +| stats count, max(_time) AS mostRecent by host, error\ +| search ```Allow exclusion via macro``` `splunkadmins_s2sfilereceiver`\ +| eval mostRecent=strftime(mostRecent, "%+") +disabled = 1 + +[SearchHeadLevel - Disabled modular inputs are running] +action.keyindicator.invert = 0 +action.makestreams.param.verbose = 0 +action.nbtstat.param.verbose = 0 +action.notable.param.verbose = 0 +action.nslookup.param.verbose = 0 +action.ping.param.verbose = 0 +action.risk.param.verbose = 0 +action.threat_add.param.verbose = 0 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 39 5 * * * +description = Chance the alert requires action? Moderate. While the modular input is disabled it appears to be running according to the introspection logs. The solution is to remove the inputs.conf and inputs.conf.spec from the relevant app, or at least remove the non-used modular inputs. Note this alert is only relevant to the search head/cluster it is running on and this is *not* an issue as such from the Splunk support point of view, however the python scripts can impact server performance... Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.visualizations.charting.chart = bar +display.visualizations.type = mapping +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to detect when a Splunk scripted_input appears to be running at OS level even though disabled=1 is set\ +Splunk support have advised that the modular input scripts run by design, in fact disabled=1 means the inheriting input stanzas should be disabled, not that the modular input is disabled\ +As per the updated documentation on https://docs.splunk.com/Documentation/AddOns/released/Overview/Distributedinstall best practice is now to remove the inputs.conf and inputs.conf.spec on SH clusters...```\ +index=_introspection `localsearchheadhosts` sourcetype=splunk_resource_usage \ +| spath component \ +| search component=PerProcess data.process!=splunkd \ +| spath data.args \ +| rex field=data.args "[/\\\\](?P<title>[^/\\\\\.]+)\.[^\.]+" \ +| join title overwrite=false\ + [| rest `splunkadmins_restmacro` /servicesNS/-/-/configs/conf-inputs \ + | search title!="*://*" disabled=1 \ + | table eai:acl.app, disabled, interval, title] \ +| rename eai:acl.app AS app \ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, values(host) AS hostList, values(data.status) AS status by app, interval, title, data.args, data.process, data.process_type \ +| eval mostRecent=strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+") \ +| table title, app, firstSeen, mostRecent, hostList, interval, data.args, data.process, data.process_type, status +disabled = 1 + +[ForwarderLevel - Forwarders connecting to a single endpoint for extended periods] +action.email.useNSSubject = 1 +alert.track = 0 +description = Experimental alert (report for now). This detects when a forwarder spent an extended period of time connecting to a single endpoint, this suggests that the EVENT_BREAKER (on a HF or a LINE_BREAKER on a HF) may assist in forcing the connection to switch between endpoints more regularly.\ +Note this there are multiple variables that change this query depending on your environment setup. +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect forwarders stuck connecting to a single indexer or heavy forwarder for an extended period of time...Assuming more than 60 seconds of continuous traffic is a problem...this may need to be customised for your environment\ +Note that the number of metrics defaults to the top 10 measured every 30 seconds, so if this is customised you will need to change this alert```\ +index=_internal sourcetype=splunkd `splunkadmins_metrics_source` sourcetype=splunkd group=tcpin_connections Metrics `indexerhosts` OR `heavyforwarderhosts`\ +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none")\ +| streamstats time_window=60s count by hostname, host, ingest_pipe\ +| where count>4\ +| eval combined=hostname . " host:" . host . " pipe:" . ingest_pipe\ +| timechart span=10m useother=false max(count) by combined + +[ForwarderLevel - Forwarders connecting to a single endpoint for extended periods UF level] +action.email.useNSSubject = 1 +alert.track = 0 +description = Experimental alert (report for now). This detects when a forwarder spent an extended period of time connecting to a single endpoint, this suggests that the EVENT_BREAKER may assist in forcing the connection to switch between endpoints more regularly.\ +Note this there are multiple variables that change this query depending on your environment setup. +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect forwarders stuck connecting to a single indexer or heavy forwarder for an extended period of time...Assuming more than 60 seconds of continuous traffic is a problem...this may need to be customised for your environment\ +Note that the number of metrics defaults to the top 10 measured every 30 seconds, so if this is customised you will need to change this alert```\ +index=_internal sourcetype=splunkd `splunkadmins_metrics_source` sourcetype=splunkd group=tcpin_connections Metrics `indexerhosts` OR `heavyforwarderhosts`\ +| eval ingest_pipe = if(isnotnull(ingest_pipe), ingest_pipe, "none")\ +| streamstats time_window=60s count by name, host, ingest_pipe\ +| where count>4\ +| eval combined=name . " host:" . host . " pipe:" . ingest_pipe\ +| timechart span=10m useother=false max(count) by combined + +[SearchHeadLevel - Determine query scan density] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This query measures the approx density of a search to determine if it's considered rare or dense.\ +This version provides an example query which can then be used to drill down into further details as required +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Determine the query scan density of queries per-index. Excludes replicated search jobs (rsa)``` index=_audit `searchheadhosts` action=search sourcetype=audittrail search_id!="rsa_*"\ +| eval sname=if(isnull(savedsearch_name) OR savedsearch_name=="", search, savedsearch_name)\ +| stats list(search_type) as search_type, list(api_et) as api_et, list(api_lt) as api_lt, list(apiStartTime) as apiStartTime, list(apiEndTime) as apiEndTime,list(search_et) as search_et, list(search_lt) as search_lt, list(info) as status, list(total_run_time) as total_run_time list(event_count) as event_count,list(considered_events) as considered_events, list(result_count) as result_count, list(scan_count) as scan_count, list(ttl) as ttl, list(is_realtime) as search_realtime_check, list(_time) as TimeAudited, list(sname) as sname by search_id\ +| where isnotnull(scan_count) AND NOT event_count="N/A" AND LIKE(sname, "%index%=%")\ +| search sname!="'typeahead prefix=*"\ +| rex field=sname "(?s)index(\s*=\s*|::)(?P<indexname>[^ \t]+)"\ +| eval indexname=replace(indexname, "'", ""), indexname=replace(indexname, "\"", "")\ +| stats avg(event_count) AS avgEventCount, avg(scan_count) AS avgScanCount, avg(result_count) AS avgResultCount by indexname\ +| eval scanDensity=(avgResultCount/avgScanCount)*100 + +[IndexerLevel - Report on bucket corruption] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Refer to IndexerLevel - Unclean Shutdown - Fsck for an alert for this issue, this just lists out all corrupt buckets.Bucket corruption is rare and in a clustered environment this should self-repair via the cluster master fixup list over time. If non-clustered refer to the documentation for splunk fsck in a non-clustered environment, you may also wish to try "IndexerLevel - Corrupt buckets via DBInspect" +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```For the alert version of this report refer to IndexerLevel - Unclean Shutdown - Fsck, you may also wish to try "IndexerLevel - Corrupt buckets via DBInspect"\ +Attempt to find bucket corruption errors in the splunkd logs. this can also be found at search head level via the info.csv (not indexed by default). In a clustered environment a message such as \"06-12-2018 07:31:47.160 +0000 INFO ProcessTracker - (child_407__Fsck) Fsck - (entire bucket) Rebuild for bucket='/opt/splunk/var/lib/splunk/indexname/db/db_1528466340_1520517600_38_A25ECA32-B33E-4469-8C76-22190FDCC8CB' took 86.26 seconds\" may appear once the auto-repair has occurred. Finally it may appear if log_search_messages is set in limits.conf (enabled by default in 9.1 and above), the sourcetype will be splunk_search_messages```\ +index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source` IndexerService OR HotBucketRoller "corrupt" newly ```newly appears to show corruption, previously may be the term for when it is fixed...```\ +| stats values(Bucket) AS bucketList by idx \ +| eval bucketCount=mvcount(bucketList)\ +| addcoltotals +display.events.fields = ["host","source","sourcetype"] +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line + +[SearchHeadLevel - Indexer Peer Connection Failures] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 53 * * * * +description = Chance the alert requires action? Moderate. For some reason one or more peers are failing to respond to the search heads, which may impact search results. Any failure will be reporting an error either to the end user or to a scheduled search. Note this alert requires the splunk_search_messages sourcetype (or search.log) and the [search]\ +log_search_messages = true\ +In the limits.conf file and then use the search_messages.log file. Note this is enabled in 9.1 by default +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Further testing required. Detect failures from the search.log advising that the peer was unable to send a response, for example This can be caused by the peer unexpectedly closing or resetting the connection. Search results might be incomplete!...This requires the search.log messages (see description of this alert) to obtain the splunk_search_messages sourcetype. The Unable to distribute to peer named status=Down scenario can also result from having many indexers and may require an increase to the timeouts in distsearch.conf\ +info.csv often reports failures as well but sometimes these are not in search.log and vice-versa. Unable to distribute to peer/Connection failed/Unable to determine response all appear to be some kind of failure. Attempting to use the [search] log_search_messages = true in the limits.conf file and then use the search_messages.log file to find what would normally appear in info.csv in the dispatch directory per-search...Note this is enabled by default in 9.1 and above```\ +index=_internal sourcetype=splunk_server_messages source!="*rsa_scheduler_*" `searchheadhosts` ("error" "for peer") OR "Error connecting" OR "Got status" ```Ignoring "HTTP error status message from" OR "HTTP client error" as they tend to appear when one of the previous examples is there...```\ +| rex "for peer (?P<peer>[^\.]+)"\ +| rex "ERROR\s+\S+\s+-\s+(sid:[^ ]+)?(?P<message>.*)"\ +| bin _time span=1m\ +| eval msgpeer = host + source + peer + _time\ +| rex field=host "(?P<host>[^\.]+)"\ +| stats dc(msgpeer) AS count, dc(eval(searchmatch("source=*scheduler_*"))) AS schedulerCount, values(host) AS reportingHost, values(message) AS message by peer, _time\ +| eval errorFrom="splunk_search_messages"\ +| append\ + [ search index=_internal sourcetype=splunk_search_messages orig_component="DispatchThread" `searchheadhosts` "Connection failed" OR "Unable to determine response" OR "Unable to distribute to peer"\ + | rex ",\"(\[[^\]]+\]\[[^\]]+\]: )?\[(?P<peer>[^\.\]]+).*?\] (?P<message>[^\"]+)"\ + | rex "Unable to distribute to peer named .* (?P<message>because.*?)\","\ + | rex field=uri "(?P<IP>[^:]+)"\ + | lookup dnslookup clientip as IP OUTPUT clienthost AS peer\ + | rex field=peer "(?P<peer>[^\.]+)"\ + | bin _time span=1m\ + | eval msgpeer = host + source + peer + _time\ + | rex field=host "(?P<host>[^\.]+)"\ + | stats dc(msgpeer) AS count, dc(eval(searchmatch("source=*scheduler_*"))) AS schedulerCount, values(host) AS reportingHost, values(message) AS message by peer, _time\ + | eval errorFrom="splunk_search_messages"\ + ]\ +| stats sum(count) AS count, sum(schedulerCount) AS schedulerCount, values(reportingHost) AS reportingHost, values(message) AS message, values(errorFrom) AS errorFrom by peer, _time\ +| eval countAndSchedulerCount = count . " / " . schedulerCount\ +| table _time, peer, reportingHost, countAndSchedulerCount, message, errorFrom\ +| sort - _time +disabled = 1 + +[SearchHeadLevel - Detect searches hitting corrupt buckets] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This query checks for searches that have found a corrupt bucket in the environment, this does require the limits.conf setting log_search_messages=true if below version 9.1 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Note this requires further testing due to switch to splunk_search_messages\ +Attempt to find corrupt buckets appearing in the search heads dispatch/info.csv files, this will show that a user is seeing the "data may be corrupt" messages\ +In a clustered environment this should auto-repair via the cluster master fixup list, messages such as \"06-12-2018 07:31:47.160 +0000 INFO ProcessTracker - (child_407__Fsck) Fsck - (entire bucket) Rebuild for bucket='/opt/splunk/var/lib/splunk/indexname/db/db_1528466340_1520517600_38_A25ECA32-B33E-4469-8C76-22190FDCC8CB' took 86.26 seconds.\" should appear in the splunkd logs. In a non-clustered environment refer to the Splunk fsck documentation. You may also wish to try IndexerLevel - Corrupt buckets via DBInspect```\ +index=_internal sourcetype=splunk_search_messages "corrupt" OR "corrupted" OR "Consider running fsck" `searchheadhosts` \ +| rex ",\"(\[[^\]]+\]\[[^\]]+\]: )?\[(?P<peer>[^\]]+)"\ +| rex "message=\[(?P<peer>[^\]]+)" \ +| rex "path='(?P<diskloc>[^']+)" \ +| rex "files in '(?P<diskloc>[^']+)" \ +| rex field=diskloc ".*/(?P<bucket>[^/]+)$" \ +| fillnull value="Unknown" diskloc peer bucket \ +| stats values(host) AS reportingHost, max(_time) AS mostRecent, first(_raw) AS raw by bucket, diskloc, peer\ +| sort - mostRecent\ +| eval mostRecent=strftime(mostRecent, "%+")\ +| table mostRecent, diskloc, peer, reportingHost, bucket, raw + +[SearchHeadLevel - Users exceeding the disk quota introspection] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 11 6,10,14,18,22,2 * * * +description = Chance the alert requires action? High. One or more users have reached the disk quota limit and may not be aware of this... Can be fixed by the end user? Yes. This version requires sendresults and customisation to work as expected.\ +Also refer to SearchHeadLevel - Users exceeding the disk quota introspection cleanup for the lookup cleaner. For testing you may wish to hardcode the email_to field and remove all lines after the line starting with table app... +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = ```The listed users have reached the maximum disk quota, they may be unaware so it is best to let them know about this issue...\ +Note that the REST API call accesses the jobs list which can expire for ad-hoc jobs in 10 minutes, the introspection index has data for a longer period of time however it's not as accurate as the rest version. This alert also outputs the emailed users to a lookup so they don't continue to receive this same email without some kind of throttle per-user\ +The REST version is called "SearchHeadLevel - Users exceeding the disk quota" and is search head specific```\ +index=_internal sourcetype=splunkd `splunkenterprisehosts` (`splunkadmins_splunkd_source`) "maximum disk usage quota" `splunkadmins_users_exceeding_diskquota` NOT [| inputlookup splunkusersexceedingdiskquota.csv | fields username ]\ +| stats max(_time) AS mostRecent by username, reason, host\ +| rename username AS username_from_search\ +| eval mostRecent = strftime(mostRecent, "%+")\ + ```For each result we find we're going to run the map command to send an email to each individual user who has had the issue, if they have been emailed before the inputlookup will exclude them. Username renamed due to issues with username variable in a scheduled search```\ +| eval body="Why am I receiving this? <br />" + reason + "<br /><br /> This occurred on host " + host + "<br /><br />The issue was last noticed on " + mostRecent + "<br /><br />" + "Your top 20 searches are listed below" + "<br /><br />"\ + ```The below is the attempt to include the largest jobs by querying the introspection index. If we use map without the appendpipe we lose parts of the original search we need. The initial workaround of makeresults and eval commands did work but this seemed slightly cleaner. Although there would be other ways to do this...```\ +| head 30\ +| append [ | makeresults | eval username_from_search="workaround for map errors", body="to pass appinspect" ]\ +| appendpipe\ + [| map\ + [ search ```The intropsection data provides a written_mb field which is not going to advise an accurate real-disk usage for a job, but it does provide an estimate and provides data for longer than a 10 minute time period for ad-hoc jobs...the | rest version is the alternative available which is more accurate but may return zero results if this is not run at least every 10 minutes, it also must run on the same search head cluster as the disk usage quota, unlike this introspection version. max(data.written_mb) was used instead of last() as sometimes the quota kicks in before the temporary files are removed.```\ + index=_introspection "data.search_props.role"=head data.written_mb>1 sourcetype=splunk_resource_usage "data.search_props.user"=$username_from_search$\ + | stats max(data.written_mb) AS MBwritten, last(data.elapsed) AS approxDuration, max(_time) AS searchLatestTime, min(_time) AS searchEarliestTime by "data.search_props.app", "data.search_props.provenance", "data.search_props.type", "data.search_props.sid"\ + | stats count, sum(MBwritten) AS MBwritten, max(approxDuration) AS approxDuration, max(searchLatestTime) AS searchLatestTime, min(searchEarliestTime) AS searchEarliestTime by "data.search_props.app", "data.search_props.provenance", "data.search_props.type"\ + | eval searchLatestTime=strftime(searchLatestTime, "%+"), searchEarliestTime=strftime(searchEarliestTime, "%+")\ + | rename data.search_props.app AS app, data.search_props.provenance AS dashboardURLorSearchName, data.search_props.type AS type\ + | sort - MBwritten\ + | eval approxDuration=substr(tostring(approxDuration,"duration"),0,8)\ + ```| appendcols\ + [ At this point you need to either lookup the username to email translation, here's an example using ldapsearch: ldapsearch search=\"(&(CN=$username_from_search$)(objectClass=organizationalPerson))" attrs=mail | fields mail ] ```\ + | eval email_to=mail\ + | fields - mail\ + ```Remove search/comment and replace <EQ> with equals to use sendresults to make this fully automated. AppInspect badge does not allow dependencies. sendresults subject="Splunk Disk Quota Exceeded" body=$body$ msgstyle="table {font-family:Arial;font-size:12px;border: 1px solid black;padding:3px}th {background-color:#AAAAAA;color:#fff;border-left: solid 1px #e9e9e9} td {border:solid 1px #e9e9e9}" showemail=f```\ + | head 20 ] maxsearches=30\ + ]\ +| where username_from_search!="workaround for map errors"\ +| table app, MBwritten, dashboardURLorSearchName, approxDuration, searchEarliestTime, searchLatestTime, username_from_search, type, count, email_to\ +| fields username_from_search \ +| rename username_from_search AS username\ +| eval currtime=now()\ +| fields currtime, username\ +| where isnotnull(username)\ +| outputlookup splunkusersexceedingdiskquota.csv append=true + +[SearchHeadLevel - Users exceeding the disk quota introspection cleanup] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 53 4,10,16,22 * * * +description = Relates to the alert SearchHeadLevel - Users exceeding the disk quota introspection\ +This report cleans up the lookup file created by the disk quota alert so that users will re-receive the alert after a period of time. Hardcoded to 1 week for now +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = | inputlookup splunkusersexceedingdiskquota.csv \ +| where currtime > now() - (7*60*60*24) \ +| outputlookup splunkusersexceedingdiskquota.csv + +[IndexerLevel - Timestamp parsing issues combined alert] +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 44 4 * * * +description = Chance the alert requires action? High. Find timestamp parsing issues and provide a report on the issues. Also refer to Mark Runal's blog for the original query or his app via https://splunkbase.splunk.com/app/1848/ +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```As found on https://runals.blogspot.com/2014/05/splunk-dateparserverbose-logs-part-2.html with minor modifications. Further queries available in the app https://splunkbase.splunk.com/app/1848/```\ +index=_internal DateParserVerbose `heavyforwarderhosts` OR `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source`\ +| rex "source(?:=|::)(?<Source>[^\|]+)\|host(?:=|::)(?<Host>[^\|]+)\|(?<Sourcetype>[^\|]+)"\ +| rex "(?<msgs_suppressed>\d+) similar messages suppressed."\ +| eval Issue = case(like(_raw, "%too far away from the previous event's time%"), "Variability in date/event timestamp", like(_raw, "%suspiciously far away from the previous event's time%"), "Variability in date/event timestamp", like(_raw, "%outside of the acceptable time window%"), "Timestamp is too far outside acceptable time window", like(_raw, "%Failed to parse timestamp%"), "Reverting to last known good timestamp", like(_raw, "%Accepted time format has changed%"), "Attempting to learn new timestamp format", like(_raw, "%The same timestamp has been used%"), "More than 100k+ events have the same timestamp", 1=1, "fixme")\ +| stats count sum(msgs_suppressed) as "Duplicate Messages Suppressed" by Sourcetype Issue Host Source\ +| stats sum(count) as count dc(Host) as Host_count, dc(Source) as Sources, sum("Duplicate Messages Suppressed") as "Duplicate Messages Suppressed", values(Host) AS Hosts by Sourcetype Issue \ +| eval "Total Count"='Duplicate Messages Suppressed' + count\ +| stats sum("Total Count") as "Total Count", list(Issue) as Issues, values(Hosts) as Hosts, list(Host_count) AS Host_count list(Sources) as Sources, list("Duplicate Messages Suppressed") as "Duplicate Messages Suppressed" by Sourcetype \ +| sort - "Total Count" +disabled = 1 + +[SearchHeadLevel - Audit log search example only] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This is just an example for querying the audit logs provided for reference only +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Query the audit logs for information about earliest/latest time and the search used. However the issue appears to be that apiStart/EndTime can be overriden by earliest/latest keywords *and* the auto-extracted search field isn't accurate and savedsearch_name of search<number> is actually a dashboards (which can be seen through introspection but not through audit searches!```\ +index=_audit `searchheadhosts` action=search info=granted search=* NOT "search='typeahead prefix"\ +| rex "(?m)search='(?P<thesearch>[\S\s]+)',\s+autojoin="\ +| table _time, ttl, user, apiStartTime, apiEndTime, earliest, latest, savedsearch_name, thesearch\ +| sort - _time + +[IndexerLevel - Buckets changes per day] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Attempt to count the number of buckets added/removed within an indexer cluster in order to forecast potential capacity issues with the cluster master +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```May not be 100% accurate, still under testing. deleteBucket frozen=false is usually excess bucket removal```\ +index=_internal "Creating hot bucket" OR ((CMSlave deleteBucket) AND "frozen=false") OR "freeze succeeded" sourcetype=splunkd `splunkadmins_splunkd_source` `indexerhosts`\ +```Multiplying by replication factor as each hot bucket is duplicated\ +To split by index\ +| rex "(/[^/]+){2}/(?P<idx2>[^/]+)"\ +| eval idx=if(isnull(idx),idx2,idx)\ +```\ +| timechart span=1d count(eval(searchmatch("Creating hot bucket"))) AS created, count(eval(searchmatch("((CMSlave deleteBucket) AND \"frozen=false\") OR \"freeze succeeded\""))) AS frozenCount\ +| eval change=(created*`splunkadmins_replicationfactor`)-frozenCount\ +| fields - created, frozenCount + +[What Access Do I Have Without REST?] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. Determine the access of the currently logged in user assuming they cannot run REST queries against the indexers. Search Head specific? Yes. Please open in search and re-execute to make this work... +dispatch.earliest_time = @d +dispatch.latest_time = now +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = Global +request.ui_dispatch_view = search +search = | `whataccessdoihave` + +[SearchHeadLevel - Users with auto-finalized searches] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. Determine who has had a search auto-finalized due to time or disk quota. This does require the limits.conf setting log_search_messages=true if below version 9.1 +dispatch.earliest_time = @d +dispatch.latest_time = now +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = Global +request.ui_dispatch_view = search +search = ```Find searches which have been auto-finalized and the search contents which was running when the search was auto-finalized. Very similar to Users Exceeding the disk quota however this covers both disk quota and srchMaxTime. Note this alert requires further testing\ +This does require the limits.conf log_search_messages=true setting to be enabled to work, if below version 9.1```\ +index=_internal `searchheadhosts` sourcetype=splunk_search_messages auto-finalized\ + ```for even more info...OR canceled OR auto-canceled OR cancelled``` \ +| rex field=source "[/\\\]dispatch[/\\\](?P<sid>[^/\\\]+)"\ +| rex "(?P<message>auto-finalized[^\"]+)"\ +| fillnull message value="Unknown"\ +| append [ | makeresults | eval sid="workaround for map errors", message="to pass appinspect" ]\ +| map\ + [ search index=_audit `searchheadhosts` "info=granted" "search_id='$sid$'"\ + | rex "(?s), search='(?P<search>.*)\]$" \ + | eval search=substr(search,0,100)\ + | eval message=$message$ ] maxsearches=50\ +| stats values(timestamp) AS time, values(message) AS message, values(search) AS search, values(apiStartTime) AS startTime, values(apiEndTime) AS endTime, values(savedsearch_name) AS savedsearch_name by user + +[SearchHeadLevel - Search Queries Per Day Audit Logs] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. A query to list the number of searches per/day by type of search (dashboard/saved search/ad-hoc) +dispatch.earliest_time = -2d@d +dispatch.latest_time = @d +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Count the number of non-system Splunk queries that use a search command, excludes rest/metrics/data model acceleration et cetera```\ +index=_audit `searchheadhosts` ", info=granted " "search='search " search_id!="'SummaryDirector_*" search_id!="'rsa_*" user!=admin user!=splunk-system-user \ +| bin _time span=1d \ +| rex "info=granted , search_id='(?P<search_id>[^']+)"\ +| rex "', savedsearch_name=\"(?P<savedsearch_name>[^\"]*)"\ +| `search_type_from_sid(search_id)` \ +| stats dc(user) AS activeUserCount, count AS totalSearchCount, count(eval(type=="scheduled")) AS savedsearchCount, count(eval(type=="dashboard")) AS searchesFromDashboards, count(eval(type=="ad-hoc")) AS adhocSearches by _time + +[SearchHeadLevel - Search Queries By Type Audit Logs] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. A pie graph to show statistics on the number of searches by type of search (index specified, index wildcard used) et cetera, "SearchHeadLevel - Search Queries By Type Audit Logs macro version" includes macro substitution but is otherwise the same report. Refer to "SearchHeadLevel - Searches by search type" for a simplified version +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = visualizations +display.page.search.tab = visualizations +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = pie +display.visualizations.show = 0 +display.visualizations.trellis.splitBy = _aggregation +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Based on the audit logs attempt to determine which types of searches are running and provide a rough % for each one```\ + index=_audit ", info=granted " `searchheadhosts` "search='" search_id!="'rsa_*"\ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex field=search "^(\s*\|)?(?P<searchbeforepipe>[^|]+)" \ +| rex mode=sed field=searchbeforepipe "s/search \(index=\* OR index=_\*\) index=/search index=/"\ +| rex mode=sed field=searchbeforepipe "s/search index=\s*\S+\s+index=/search index=/"\ +| eval indexNotSpecified = if(NOT match(searchbeforepipe,"(index(::|\s*=))|(index\s*IN)") AND match(searchbeforepipe,"^\s*search "),"1","0")\ +| eval macroWithIndexClause = if(isnotnull(searchbeforepipe) AND (match(searchbeforepipe,"(?s)^\s*search\s.*(index(\s*=|::)|(index\s*IN))") AND match(searchbeforepipe,"`")),"1","0")\ +| stats count, count(eval(match(searchbeforepipe,"(index(\s*=|::))|(index\s*IN)"))) AS indexClause, count(eval(match(searchbeforepipe,"(index(\s*=|::)\s*\S*\*)|(index\s+IN\s*\([^\)]*\*)"))) AS indexWildcard, count(eval(match(searchbeforepipe,"\`[^\`]+\`"))) AS macroNoIndex, count(eval(match(search,"^\s*\|\s*summarize"))) AS summarize, count(eval(match(search,"(?i)^\s*\|\s*savedsearch"))) AS savedsearch, count(eval(match(search,"(?i)^\s*\|\s*(from\s*)?datamodel"))) AS datamodel, count(eval(match(search,"(?i)^\s*\|\s*loadjob"))) AS loadjob, count(eval(match(search,"(?i)^\s*\|\s*(multisearch|union)"))) AS multisearch, count(eval(match(search,"(?i)^\s*\|\s*(pivot)"))) AS pivot, count(eval(match(search,"(?i)^\s*\|\s*(metadata)"))) AS metadata, count(eval(indexNotSpecified==1)) AS indexNotSpecified, count(eval(macroWithIndexClause==1)) AS macroWithIndexClause, count(eval(match(search,"(?i)^\s*\|\s*(tstats)"))) AS tstats, count(eval(match(search,"(?i)^\s*\|\s*(rest)"))) AS rest, count(eval(match(search,"(?i)^\s*\|\s*(mcatalog|mstats)"))) AS metrics, count(eval(match(search,"(?i)^\s*\|\s*(from\s+)?inputlookup"))) AS inputlookup, count(eval(match(search_id,"^'ta_"))) AS typeahead\ +| eval macroNoIndex = macroNoIndex-macroWithIndexClause, indexClause = indexClause - indexWildcard\ +| eval unknown = count - (indexClause + macroNoIndex + summarize + savedsearch + datamodel + loadjob + multisearch + pivot + metadata + indexNotSpecified + tstats + rest + metrics + inputlookup + typeahead)\ +| fields - macroWithIndexClause, count\ +| transpose column_name="xaxis" header_field="perc" + +[SearchHeadLevel - Search Queries By Type Audit Logs macro version] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. A pie graph to show statistics on the number of searches by type of search (index specified, index wildcard used) et cetera this version attempts to substitute macros, "SearchHeadLevel - Search Queries By Type Audit Logs" does not include macro substitution but is otherwise the same report. Requires "SearchHeadLevel - Macro report". Refer to "SearchHeadLevel - Searches by search type" for a simplified version +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = visualizations +display.page.search.tab = visualizations +display.statistics.drilldown = none +display.statistics.show = 0 +display.statistics.wrap = 0 +display.visualizations.charting.chart = pie +display.visualizations.trellis.splitBy = _aggregation +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Based on the audit logs attempt to determine which types of searches are running and provide a rough % for each one```\ + index=_audit `searchheadhosts` ", info=granted " "search='" search_id!="'rsa_*"\ +| rex "(?s), search='(?P<search>.*)\]$" \ +| `splunkadmins_macro_sub("search")` \ +| `splunkadmins_macro_sub("search")` \ +| rex field=search "(?s)^(\s*\|)?(?P<searchbeforepipe>[^|]+)" \ +| rex mode=sed field=searchbeforepipe "s/search \(index=\* OR index=_\*\) index=/search index=/" \ +| rex mode=sed field=searchbeforepipe "s/search index=\s*\S+\s+index=/search index=/" \ +| eval indexNotSpecified = if(NOT match(searchbeforepipe,"(index(\s*=|::))|(index\s*IN)") AND match(searchbeforepipe,"^\s*search "),"1","0")\ +| eval macroWithIndexClause = if(isnotnull(searchbeforepipe) AND (match(searchbeforepipe,"(?s)^\s*search\s.*(index(\s*=|::))|(index\s*IN)") AND hasMacro=="1"),"1","0")\ +| stats count, count(eval(match(searchbeforepipe,"(index(\s*=|::))|(index\s*IN)"))) AS indexClause, count(eval(match(searchbeforepipe,"(index(\s*=\s*|::)\S*\*)|(index\s+IN\s*\([^\)]*\*)"))) AS indexWildcard, count(eval(hasMacro=="1")) AS macroNoIndex, count(eval(match(search,"^\s*\|\s*summarize"))) AS summarize, count(eval(match(search,"(?i)^\s*\|\s*savedsearch"))) AS savedsearch, count(eval(match(search,"(?i)^\s*\|\s*(from\s*)?datamodel"))) AS datamodel, count(eval(match(search,"(?i)^\s*\|\s*loadjob"))) AS loadjob, count(eval(match(search,"(?i)^\s*\|\s*(multisearch|union)"))) AS multisearch, count(eval(match(search,"(?i)^\s*\|\s*(pivot)"))) AS pivot, count(eval(match(search,"(?i)^\s*\|\s*(metadata)"))) AS metadata, count(eval(indexNotSpecified==1)) AS indexNotSpecified, count(eval(macroWithIndexClause==1)) AS macroWithIndexClause, count(eval(match(search,"(?i)^\s*\|\s*(tstats)"))) AS tstats, count(eval(match(search,"(?i)^\s*\|\s*(rest)"))) AS rest, count(eval(match(search,"(?i)^\s*\|\s*(mcatalog|mstats)"))) AS metrics, count(eval(match(search,"(?i)^\s*\|\s*(from\s+)?inputlookup"))) AS inputlookup, count(eval(match(search_id,"^'ta_"))) AS typeahead\ +| eval indexClause = indexClause-indexWildcard, macroNoIndex = macroNoIndex-macroWithIndexClause\ +| eval total = indexClause + indexWildcard + macroNoIndex + summarize + savedsearch + datamodel + loadjob + multisearch + pivot + metadata + indexNotSpecified + tstats + rest + metrics + typeahead\ +| eval unknown = count - total\ +| fields - total, searchcommandcount, macroWithIndexClause, count\ +| transpose column_name="xaxis" header_field="perc" + +[SearchHeadLevel - Search Queries By Type Audit Logs macro version other] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report relates to the "SearchHeadLevel - Search Queries By Type Audit Logs" and equivalent macro version but exists to print out the entries that did not fit into any of the categories. Requires "SearchHeadLevel - Macro report". +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = pie +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Based on the audit logs attempt to determine which types of searches are running and provide a rough % for each one```\ + index=_audit `searchheadhosts` ", info=granted " "search='" search_id!="'rsa_*"\ +| rex "(?s), search='(?P<search>.*)\]$" \ +| `splunkadmins_macro_sub("search")` \ +| `splunkadmins_macro_sub("search")` \ +| rex field=search "(?s)^(\s*\|)?(?P<searchbeforepipe>[^|]+)"\ +| rex mode=sed field=searchbeforepipe "s/search \(index=\* OR index=_\*\) index=/search index=/"\ +| rex mode=sed field=searchbeforepipe "s/search index=\s*\S+\s+index=/search index=/"\ +| eval indexNotSpecified = if(NOT match(searchbeforepipe,"(index(\s*=|::))|(index\s*IN)") AND match(searchbeforepipe,"^\s*search "),"1","0")\ +| eval macroWithIndexClause = if(isnotnull(searchbeforepipe) AND (match(searchbeforepipe,"(?s)^\s*search\s.*(index(\s*=|::))|(index\s*IN)") AND hasMacro=="1"),"1","0")\ +| eval indexClause = if(match(searchbeforepipe,"(index(\s*=|::))|(index\s*IN)"),"1","0")\ +| eval indexWildcard = if(match(searchbeforepipe,"(index(\s*=\s*|::)\S*\*)|(index\s+IN\s*\([^\)]*\*)"),"1","0")\ +| eval macroNoIndex = if(hasMacro=="1","1","0")\ +| eval summarize = if(match(search,"^\s*\|\s*summarize"),"1","0")\ +| eval savedsearch = if(match(search,"(?i)^\s*\|\s*(from\s+)?savedsearch"),"1","0")\ +| eval datamodel = if(match(search,"(?i)^\s*\|\s*(from\s+)?datamodel"),"1","0")\ +| eval loadjob = if(match(search,"(?i)^\s*\|\s*loadjob"),"1","0")\ +| eval multisearch = if(match(search,"(?i)^\s*\|\s*(multisearch|union)"),"1","0")\ +| eval pivot = if(match(search,"(?i)^\s*\|\s*(pivot)"),"1","0")\ +| eval metadata = if(match(search,"(?i)^\s*\|\s*(metadata)"),"1","0")\ +| eval tstats = if(match(search,"(?i)^\s*\|\s*(tstats)"),"1","0")\ +| eval rest = if(match(search,"(?i)^\s*\|\s*(rest)"),"1","0")\ +| eval inputlookup = if(match(search,"(?i)^\s*\|\s*(from\s+)?inputlookup"),"1","0")\ +| eval metrics = if(match(search,"(?i)^\s*\|\s*(mcatalog|mstats)"),"1","0")\ +| eval typeahead = if(match(search_id,"^'ta_"),"1","0")\ +| search indexClause=0 AND indexWildcard=0 AND macroNoIndex=0 AND summarize=0 AND savedsearch=0 AND datamodel=0 AND loadjob=0 AND pivot=0 AND multisearch=0 AND metadata=0 AND indexNotSpecified=0 AND macroWithIndexClause=0 AND tstats=0 AND rest=0 AND inputlookup=0 AND metrics=0 AND typeahead=0\ +| cluster field=search t=0.01 showcount=true\ +| table search, cluster_count + +[SearchHeadLevel - Role access list by user] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report outputs a list of users and which role they are in, and what the srchIndexesDefault and srchIndexesAvailable exist for that particular user, this does not list individual index names, the report "SearchHeadLevel - Index access list by user" exists to list index names... +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/authentication/users splunk_server="local" \ +| eval comment="This search aims to provide a giant list of users and a list of wildcards (or index names if specified in the Splunk config) for srchIndexesAllowed/srchIndexesDefault" \ +| table title roles\ +| append [ | makeresults | eval title="splunk-system-user", roles="admin" ]\ +| rename title as user \ +| mvexpand roles \ +| join type=left roles \ + [ rest /services/authorization/roles splunk_server="local" \ + | table title, srchIndexesAllowed, srchIndexesDefault, imported_srchIndexesAllowed, imported_srchIndexesDefault\ + | rename title as roles] \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| makemv srchIndexesDefault tokenizer=(\S+)\ +| makemv imported_srchIndexesAllowed tokenizer=(\S+)\ +| makemv imported_srchIndexesDefault tokenizer=(\S+)\ +| eval srchIndexesAllowed = mvappend(srchIndexesAllowed, imported_srchIndexesAllowed)\ +| eval srchIndexesDefault = mvappend(srchIndexesDefault, imported_srchIndexesDefault)\ +| fillnull srchIndexesDefault, srchIndexesAllowed value="removeme"\ +| mvexpand srchIndexesAllowed\ +| eval srchIndexesAllowed=if(srchIndexesAllowed=="removeme",null(),srchIndexesAllowed)\ +| fields srchIndexesAllowed, srchIndexesDefault, user\ +| stats values(*) as * by user\ +| mvexpand srchIndexesDefault\ +| eval srchIndexesDefault=if(srchIndexesDefault=="removeme",null(),srchIndexesDefault)\ +| stats values(*) as * by user + +[SearchHeadLevel - Index access list by user] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report outputs a list of indexes available on a per-user basis which is used by another report, "SearchHeadLevel - Search Queries summary non-exact match", requires report "SearchHeadLevel - Index list report" +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/authorization/roles splunk_server="local" \ +| eval comment="This search aims to provide a giant list of users and what indexes they have access to (as in a list of index names, not a list of wildcards). Due to mvexpand hitting memory limits in the environment this alternative version runs many subsearches that do not hit the memory limits" \ +| table title, srchIndexesAllowed, srchIndexesDefault, imported_srchIndexesAllowed, imported_srchIndexesDefault \ +| rename title as roles \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| makemv imported_srchIndexesAllowed tokenizer=(\S+) \ +| makemv imported_srchIndexesDefault tokenizer=(\S+) \ +| eval srchIndexesAllowed = mvappend(srchIndexesAllowed, imported_srchIndexesAllowed) \ +| eval srchIndexesDefault = mvappend(srchIndexesDefault, imported_srchIndexesDefault) \ +| fillnull srchIndexesDefault, srchIndexesAllowed value="requiredformvexpand" \ +| mvexpand srchIndexesAllowed \ +| eval srchIndexesAllowed=if(srchIndexesAllowed=="requiredformvexpand",null(),srchIndexesAllowed) \ +| eval srchIndexesAllowed=lower(srchIndexesAllowed) \ +| fields srchIndexesAllowed, srchIndexesDefault, roles \ +| map \ + [| inputlookup splunkadmins_indexlist where index="$srchIndexesAllowed$" AND index!="requiredformvexpand"\ + | eval regex="^" . "$srchIndexesAllowed$" . "$" \ + | eval regex=replace(regex,"\*",".*") \ + | eval regex=if(substr(regex,1,3)=="^.*","^[^_].*" . substr(regex,4),regex) \ + | where match(index,regex) \ + | eval srchIndexesAllowed="$srchIndexesAllowed$", srchIndexesDefault="$srchIndexesDefault$", roles="$roles$" \ + | fields index, roles, srchIndexesAllowed, srchIndexesDefault ] maxsearches=5000 \ +| stats values(index) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| mvexpand srchIndexesDefault \ +| append [ | makeresults | eval srchIndexesAllowed="workaround for map errors", srchIndexesDefault="to pass appinspect", roles="N/A" ]\ +| map \ + [| inputlookup splunkadmins_indexlist where index="$srchIndexesDefault$" \ + | eval regex="^" . "$srchIndexesDefault$" . "$" \ + | eval regex=replace(lower(regex),"\*",".*") \ + | eval regex=if(substr(regex,1,3)=="^.*","^[^_].*" . substr(regex,4),regex) \ + | where match(index,regex) \ + | eval srchIndexesAllowed="$srchIndexesAllowed$", srchIndexesDefault="$srchIndexesDefault$", roles="$roles$" \ + | fields index, roles, srchIndexesAllowed, srchIndexesDefault ] maxsearches=5000 \ +| where srchIndexesAllowed!="workaround for map errors"\ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(index) AS srchIndexesDefault by roles \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| append \ + [| rest /services/admin/LDAP-groups `splunkadmins_restmacro` \ + | where isnotnull(roles) \ + | mvexpand users \ + | rex field=users "CN=(?P<user>[^,]+)" \ + | stats values(user) AS user by roles ] \ +| append \ + [| rest /services/authentication/users `splunkadmins_restmacro` \ + | search type=Splunk \ + | table title, roles \ + | rename title AS user \ + | mvexpand roles ] \ +| append \ + [| makeresults \ + | eval user="splunk-system-user", roles="admin" ]\ +| eval srchIndexesDefault = if(srchIndexesDefault=="requiredformvexpand",null(),srchIndexesDefault) \ +| eventstats values(srchIndexesAllowed) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles \ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by user\ +| outputlookup splunkadmins_userlist_indexinfo + +[SearchHeadLevel - Index list report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report outputs a list of indexes available and an additional "requiredformvexpand" value, which is used by the report "SearchHeadLevel - Index access list by user" +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | eventcount summarize=false index=* OR index=_* \ +| fields index \ +| dedup index\ +| append [ | makeresults | eval index="requiredformvexpand" ]\ +| table index\ +| outputlookup splunkadmins_indexlist + +[SearchHeadLevel - Scheduled Search Efficiency] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report was orginally found on answers or a Splunk conf talk, it lists the scheduled searches, how often they run and how long they have taken to run +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This likely came from a Splunk conf presentation but I cannot remember which one so cannot attribute the original author!\ +Determine the length of time a scheduled search takes to run compared to how often it is configured to run, excluding acceleration jobs```\ +index=_internal `searchheadhosts` sourcetype=scheduler source=*scheduler.log (user=*) savedsearch_name!="_ACCELERATE_DM*"\ +| stats avg(run_time) as average_runtime_in_sec count(savedsearch_name) as num_times_per_week sum(run_time) as total_runtime_sec by savedsearch_name user app host\ +| eval ran_every_x_mins=round(60/(num_times_per_week/168))\ +| eval average_runtime_duration=tostring(round(average_runtime_in_sec/60,2), "duration")\ +| eval average_runtime_in_sec=round(average_runtime_in_sec, 2)\ +| eval efficiency=round(((60/(num_times_per_week/168))/(average_runtime_in_sec/60)), 2)\ +| sort efficiency\ +| table savedsearch_name, app, average_runtime_duration, num_times_per_week, ran_every_x_mins, efficiency, user, host + +[AllSplunkLevel - Data Loss on shutdown] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +auto_summarize.dispatch.earliest_time = -1d@h +counttype = number of events +cron_schedule = 14 7 * * * +description = Chance the alert requires action? Moderate. This alert usually indicates that at least some data was lost during the shutdown of the forwarder (universal or heavy), while nothing can be done about the lost data the queues may be tweaked to improve this scenario if the servers were up +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Waiting for TcpOutputGroups to shutdown followed by the message 'Forcing TcpOutputGroups to shutdown after timeout' is not something that can be controlled as of 7.2.1. It also generally correlates with data loss as the forwarder in question could not get the data through the queues and out of the forwarder/indexer before shutdown. Increasing parsingQueue/aggQueue or similar may help in this scenario```\ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) "Forcing TcpOutputGroups to shutdown after timeout"\ +| stats count, earliest(_time) AS firstSeen, latest(_time) AS lastSeen by host \ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[SearchHeadLevel - Dashboard load times] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report was created to roughly find dashboard load times, if the timepicker for the dashboard is changed then this report will produce different results so this should either be run on a dashboard with consistent time periods or used to roughly measure load times... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Determine dashboard load times by using the introspection index, note that changes in the timepicker or similar will change your dashboard load times so this is provided as an example query only```\ +index=_introspection `searchheadhosts` sourcetype=splunk_resource_usage data.search_props.sid::* "UI:Dashboard"\ +| eval app = 'data.search_props.app'\ +| eval elapsed = 'data.elapsed'\ +| eval label = 'data.search_props.label'\ +| eval type = 'data.search_props.type'\ +| eval mode = 'data.search_props.mode'\ +| eval user = 'data.search_props.user'\ +| eval provenance='data.search_props.provenance'\ +| eval label=coalesce(label, provenance)\ +| eval search_head = 'data.search_props.search_head'\ +| stats max(elapsed) as runtime earliest(_time) as Started by type, mode, app, user, label, data.pid, search_head, host\ +| stats max(runtime) AS totalRuntime by Started, app, user, label\ +| sort - Started\ +| eval Started=strftime(Started,"%+")\ +| eval duration = tostring(totalRuntime, "duration") + +[SearchHeadLevel - Scheduled searches status] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Based on the discussion with Burch on https://answers.splunk.com/answers/702075/what-is-the-best-way-to-find-searches-without-sour.html just a simple way of reporting if sourcetype/index fields are used in saved searches +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches `splunkadmins_restmacro`\ + | fields qualifiedSearch, next_scheduled_time, title, eai:acl.owner, eai:acl.app\ + ```Based on the discussion with Burch on https://answers.splunk.com/answers/702075/what-is-the-best-way-to-find-searches-without-sour.html just a simple way of reporting if sourcetype/index fields are used in saved searches```\ + | where match( qualifiedSearch , "^\s*search\s*" )\ + | rex field=qualifiedSearch "(?s)^(?<base_search>search[^\|\[]+)"\ + | eval\ + check-sourcetype = if( match( base_search , "\s+sourcetype\s*=" ) , "defined" , "missing" ) ,\ + check-index = if( match( base_search , "\s+index\s*(=|IN)" ) , "defined" , "missing" ) ,\ + check-index-contains-wildcard = if( match( base_search , "\s+index\s*(=\s*[^\*]+(\s|$)|IN\s*\([^\)\*]+\s*\))" ) , "missing" , "defined" ) ,\ + check-index-starts-wildcard = if( match( base_search , "\s+index\s*(=\s*\*|IN\s*\(\s*\*)" ) , "defined" , "missing" ) ,\ + check-hidden = if( match( base_search , "\s+((tag|eventtype)\s*=)" ) , "defined" , "missing" ) ,\ + check-macro = if( match( base_search , "\`[^\`]+\`" ) , "defined" , "missing" ) ,\ + check-scheduled = if( match( next_scheduled_time , ".+" ) , "defined" , "missing" )\ + | rename eai:acl.* AS namespace-*\ + | search ( check-sourcetype="missing" OR check-index="missing" ) check-hidden="missing" check-scheduled="defined"\ + | table title, check-index, check-sourcetype, base_search, namespace-*, check-index-contains-wildcard, check-index-starts-wildcard, check-macro + +[SearchHeadLevel - Detect changes to knowledge objects] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +description = Report only? Yes. Attempt to determine what times knowledge object config was changed by using splunk access logs (information not available in audit logs at the time of testing). Also refer to SearchHeadLevel - Detect changes to knowledge objects directory/non-directory +search = ```Attempt to determine what changes to knowledge or creation of new knowledge objects on a per-application/type basis. Using splunkd*access logs due to lack of information in other logs in non-clustered search heads. Everything works fine unless the /services/ endpoint is used in which case we have no idea which app owned the updated item and we have to just assume it could be any app. Also the query is super-complicated because Splunk often provides 3 endpoints to edit the same config, it also allows // or similar in the URL. Also refer to the alternative queries for SearchHeadLevel - Detect changes to knowledge objects directory, and SearchHeadLevel - Detect changes to knowledge objects non-directory```\ +index=_internal sourcetype=splunkd_access OR sourcetype=splunkd_ui_access method=POST status=200 OR status=201 NOT "/manager/" NOT "/dispatch HTTP"\ +| rex field=uri "/servicesNS/[^/]+/(?P<app>[^/]+)" \ +| eval type=case(match(uri,"/data/+props/+calcfields($|/)"),"calcfields",match(uri,"/saved/+searches($|/)"),"savedsearch",match(uri,"/admin/+savedsearch($|/)"),"savedsearch",match(uri,"/configs/+conf-savedsearches"),"savedsearch",match(uri,"/data/+ui/+views($|/)"),"dashboards",match(uri,"/+admin/+views($|/)"),"dashboards",match(uri,"/data/+props/+fieldaliases($|/)"),"fieldaliases",match(uri,"/+admin/+fieldaliases(/|$)"),"fieldaliases",match(uri,"data/+props/+extractions"),"fieldextractions",match(uri,"/+admin/+props-extract($|/)"),"fieldextractions",match(uri,"/data/+transforms/+extractions($|/)"),"fieldtransformations",match(uri,"/+admin/+transforms-extract($|/)"),"fieldtransformations",match(uri,"data/+ui/+workflow-actions($|/)"),"workflow-actions",match(uri,"/+admin/+workflow-actions($|/)"),"workflow-actions",match(uri,"/+configs/+conf-workflow_actions($|/)"),"workflow-actions",match(uri,"/+configs/+conf-props($|/)"),"props*",match(uri,"/+configs/+conf-transforms($|/)"),"transforms*",match(uri,"/+data/+props/+sourcetype-rename($|/)"),"sourcetype-renaming",match(uri,"/+admin/+sourcetype-rename($|/)"),"sourcetype-renaming",match(uri,"/+admin/+tags($|/)"),"tags",match(uri,"/+saved/+(n|fv)tags($|/)"),"tags",match(uri,"/+configs/+conf-tags($|/)"),"tags",match(uri,"/+saved/+eventtypes($|/)"),"eventtypes",match(uri,"/+admin/+eventtypes($|/)"),"eventtypes",match(uri,"/+configs/+conf-eventtypes($|/)"),"eventtypes_conf",match(uri,"/+data/+ui/+nav($|/)"),"navMenu",match(uri,"/+admin/*nav($|/)"),"navMenu",match(uri,"/+datamodel/+model($|/)"),"datamodel",match(uri,"/+configs/+conf-datamodels($|/)"),"datamodel",match(uri,"/+admin/+datamodel-files($|/)"),"datamodel",match(uri,"/+admin/+datamodeledit($|/)"),"datamodel",match(uri,"/+storage/+collections/+config($|/)"),"kvstore",match(uri,"/+configs/+conf-collections($|/)"),"kvstore",match(uri,"/+admin/+collections-conf($|/)"),"kvstore",match(uri,"/+data/+ui/+times($|/)"),"times",match(uri,"/+configs/+conf-times($|/)"),"times",match(uri,"/+admin/+conf-times($|/)"),"times",match(uri,"/+data/+ui/+panels($|/)"),"panels",match(uri,"/+configs/+conf-panels($|/)"),"panels",match(uri,"/+data/+props/+lookups($|/)"),"lookup-definition",match(uri,"/+admin/+props-lookup($|/)"),"lookup-definition",match(uri,"/+data/+transforms/+lookups($|/)"),"automaticlookup",match(uri,"/+admin/+transforms-lookup($|/)"),"automaticlookup",match(uri,"/+admin/+macros($|/)"),"macros",match(uri,"/+configs/+conf-macros($|/)"),"macros",match(uri,"/+data/+macros($|/)"),"macros",1==1,"unknown")\ +| where type!="unknown"\ +| eval app=if(isnull(app),"*",app)\ +| eval type2 = if(type=="eventtypes","tags",null())\ +```note that eventtypes can have tags created so assume eventtype == tag creation, eventtypes_conf doesn't appear to work```\ +| stats values(_time) AS times, values(user) AS userList, values(type2) AS type2 by type, app + +[SearchHeadLevel - Detect changes to knowledge objects directory] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Attempt to determine what times knowledge object config was changed by using the directory REST API endpoint, note this does not cover all knowledge objects, also refer to SearchHeadLevel - Detect changes to knowledge objects non-directory +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest "/services/directory" count=0 `splunkadmins_restmacro`\ +| table updated, eai:type, eai:acl.app, eai:location\ +| eval updatedEpoch=strptime(updated,"%Y-%m-%dT%H:%M:%S%:z")\ +| rename eai:type AS type, eai:acl.app AS app, eai:location AS location\ +| stats count by type, app\ +| fields - count + +[SearchHeadLevel - Detect changes to knowledge objects non-directory] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Attempt to determine what times knowledge object config was changed by using the directory REST API endpoint, note this does not cover all knowledge objects, also refer to SearchHeadLevel - Detect changes to knowledge objects directory +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/storage/collections/config count=0 f=updated f=eai:appName\ +| eval type="kvstore"\ +| append [ | rest `splunkadmins_restmacro` /servicesNS/-/-/datamodel/model count=0 f=updated f=eai:appName | eval type="datamodel" ]\ +| append [ | rest `splunkadmins_restmacro` /servicesNS/-/-/data/ui/panels count=0 f=updated f=eai:appName | eval type="panels" ]\ +| append [ | rest `splunkadmins_restmacro` /servicesNS/-/-/data/props/calcfields count=0 | table updated, eai:acl.app | rename eai:acl.app AS eai:appName | eval type="calcfields" ]\ +| eval updatedEpoch=strptime(updated,"%Y-%m-%dT%H:%M:%S%:z")\ +| rename eai:appName AS app\ +| stats count by type, app + + +[IndexerLevel - Corrupt buckets via DBInspect] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -7d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Run the dbinspect command to determine which buckets appear to be corrupt, the search ignores hot/streaming hot buckets as they appear to be corrupt when tested in 7.0.5, note that this appears to check metadata not journal files so splunk fsck may still be required +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | dbinspect corruptonly=true index=*\ +| regex path!="[\\\\/](\d+_|hot_)[^ ]+$"\ +| table bucketId, corruptReason, index, modTime, path, splunk_server, state\ +| eval command="splunk fsck repair --one-bucket --bucket-path=" + path + " &" + +[SearchHeadLevel - Maximum memory utilisation per search] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. As found on SplunkAnswers check the maximum memory usage per-search ( https://answers.splunk.com/answers/500973/how-to-improve-my-search-to-identify-queries-which.html ) +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```As originally found on https://answers.splunk.com/answers/500973/how-to-improve-my-search-to-identify-queries-which.html / DalJeanis with minor modifications. Max memory used per search process at search head level```\ +index=_introspection `searchheadhosts` sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*\ +| stats max(data.mem_used) AS peak_mem_usage,\ + latest(data.search_props.mode) AS mode,\ + latest(data.search_props.type) AS type,\ + latest(data.search_props.role) AS role,\ + latest(data.search_props.app) AS app,\ + latest(data.search_props.user) AS user,\ + latest(data.search_props.provenance) AS provenance,\ + latest(data.search_props.label) AS label,\ + latest(host) AS splunk_server,\ + min(_time) AS min_time,\ + max(_time) AS max_time\ + by data.search_props.sid, host\ +| sort - peak_mem_usage\ +| head 50\ +| table provenance, peak_mem_usage, label, mode, type, role, app, user, min_time, max_time, data.search_props.sid splunk_server\ +| eval min_time=strftime(min_time, "%+"), max_time=strftime(max_time, "%+")\ +| rename data.search_props.sid AS sid,\ + peak_mem_usage AS "Peak Physical Memory Usage (MB)",\ + min_time AS "First time seen",\ + max_time AS "Last time seen" + +[IndexerLevel - Maximum memory utilisation per search] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. As found on SplunkAnswers check the maximum memory usage per-search ( https://answers.splunk.com/answers/500973/how-to-improve-my-search-to-identify-queries-which.html ) +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```As originally found on https://answers.splunk.com/answers/500973/how-to-improve-my-search-to-identify-queries-which.html / DalJeanis with minor modifications. Max memory used per search process at search head level```\ +index=_introspection `indexerhosts` sourcetype=splunk_resource_usage component=PerProcess data.search_props.sid=*\ +| stats max(data.mem_used) AS peak_mem_usage,\ + latest(data.search_props.mode) AS mode,\ + latest(data.search_props.type) AS type,\ + latest(data.search_props.role) AS role,\ + latest(data.search_props.app) AS app,\ + latest(data.search_props.user) AS user,\ + latest(data.search_props.provenance) AS provenance,\ + latest(data.search_props.label) AS label,\ + latest(host) AS splunk_server,\ + min(_time) AS min_time,\ + max(_time) AS max_time\ + by data.search_props.sid, host\ +| sort - peak_mem_usage\ +| head 50\ +| table provenance, peak_mem_usage, label, mode, type, role, app, user, min_time, max_time, data.search_props.sid splunk_server\ +| eval min_time=strftime(min_time, "%+"), max_time=strftime(max_time, "%+")\ +| rename data.search_props.sid AS sid,\ + peak_mem_usage AS "Peak Physical Memory Usage (MB)",\ + min_time AS "First time seen",\ + max_time AS "Last time seen" + +[SearchHeadLevel - Detect Excessive Search Use - Dashboard - Automated] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Based on the Detect Excessive Search Use dashboard, attempt to automate detection of dashboards loaded by multiple users +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Based on contents of Detect Excessive Search Use, with the addition of narrowing down to dashboard loads with more than 1 user. From there, attempt to auto-run the introspection index query to determine which dashboards may be involved and which apps. Finally this finishes with a sendresults email to the customer to advise them to consider changing their dashboard to use scheduled saved searches```\ +index=_audit info=granted "search='" NOT "savedsearch_name=\"Threat - Correlation Searches - Lookup Gen\"" NOT "savedsearch_name=\"Bucket Copy Trigger\"" NOT "search='| copybuckets" NOT "search='search index=_telemetry sourcetype=splunk_telemetry | spath" NOT "savedsearch_name=\"_ACCELERATE_*"\ +| rex "(?s), search='(?P<search>.*)\]$" \ +| regex search!="\|\s+(rest|inputlookup|makeresults|tstats count AS \"Count of [^\"]+\"\s+ from sid=)"\ +| rex "apiEndTime='[^,]+, savedsearch_name=\"(?P<savedsearch_name>[^\"]+)"\ +| eval apiEndTime=strptime(apiEndTime, "'%a %B %d %H:%M:%S %Y'"), apiStartTime=strptime(apiStartTime, "'%a %B %d %H:%M:%S %Y'")\ +| eval timePeriod=apiEndTime-apiStartTime\ +| bin _time span=10m\ +| eval search_id=substr(search_id,2,len(search_id)-2)\ +| stats count, values(host) AS hostList, values(savedsearch_name) AS savedSearchName, values(ttl) AS ttl by search, user, _time, timePeriod\ +| eval frequency = ceil((10*60)/timePeriod)\ +| fillnull frequency\ +| where count>4 AND count>frequency\ +| eval timePeriod=tostring(timePeriod,"duration")\ +| stats sum(count) AS count, max(count) AS "maxCountPerSpan", values(user) AS userList, values(hostList) AS hostList, values(savedSearchName) AS savedSearchName, earliest(_time) AS firstSeen, latest(_time) AS mostRecent by search\ +| where mvcount(userList) > 2 AND match(savedSearchName, "^search\d+")\ +| stats values(sids) AS sids, max(count) AS count, max(maxCountPerSpan) AS maxCountPerSpan by firstSeen, mostRecent, userList, hostList\ +| stats values(userList) AS userList, values(hostList) AS hostList by firstSeen, mostRecent, count, maxCountPerSpan\ +| addinfo\ +| eval searchDuration = tostring(info_max_time - info_min_time, "duration")\ +| sort - count\ +| head 20\ +| append [ | makeresults | eval userList="nonexistentuser", loadCount=0, searchDuration="10", count="0" | fields - _time ]\ +| map\ + [ search index=_introspection `indexerhosts` sourcetype=splunk_resource_usage\ + [| makeresults\ + | eval data.search_props.user=$userList$\ + | makemv data.search_props.user\ + | mvexpand data.search_props.user\ + | return 20 data.search_props.user ]\ + | eval users=$userList$, loadCount=$count$, searchDuration=$searchDuration$\ + | stats count, values(users) AS userList, values(loadCount) AS loadCount, values(searchDuration) AS searchDuration by data.search_props.provenance, data.search_props.app\ + | search data.search_props.provenance=UI:Dashboard*\ + | rename data.search_props.provenance AS provenance, data.search_props.app AS app ] maxsearches=20\ +| search ```Exclusions lists apply at this point as we have app/dashboard context``` NOT [ | inputlookup dashboard_automated_app_exclusion.csv ] NOT [ | inputlookup dashboard_automated_app_history.csv | makemv searchInfo tokenizer=(\S+) | mvexpand searchInfo | rename searchInfo AS provenance | fields - currtime ]\ +| makemv userList\ +| eval userCount = mvcount(userList)\ +| mvexpand userList\ + ```Beyond this point this search will likely need customisation to work in a particular environment...\ +| ldapfilter search="(&(CN=$userList$)(objectClass=organizationalPerson))" attrs="mail"\ +| where isnotnull(mail)\ +```\ +| stats values(provenance) AS searchInfo, values(searchDuration) AS searchDuration, values(loadCount) AS loadCount, values(mail) AS email_to, max(userCount) AS maxUsersPerDashboard by app\ +| nomv email_to\ +| rex mode=sed field=email_to "s/ /;/g"\ + ```\ +| sendresults subject="Shared dashboards not using saved searches" msgstyle="table {font-family:Arial;font-size:12px;border: 1px solid black;padding:3px}th {background-color:#AAAAAA;color:#fff;border-left: solid 1px #e9e9e9} td {border:solid 1px #e9e9e9}" showemail=f showsubj=f\ +| fields app, searchInfo\ +| eval currtime=now()\ +| outputlookup dashboard_automated_app_history.csv append=true\ +``` + +[ForwarderLevel - Splunk HEC issues] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 57 3 * * * +description = Chance the alert requires action? High. HEC failures mean that the client is expected to handle the failure, this may occur when queues on the forwarder are full. Note that the lookup splunkadmins_hec_reply_code_lookup is based on https://docs.splunk.com/Documentation/Splunk/8.2.3/Data/TroubleshootHTTPEventCollector and this may change over time +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Find if the HEC is throwing errors. Refer to https://docs.splunk.com/Documentation/Splunk/latest/Data/TroubleshootHTTPEventCollector for more information``` \ +index=_internal "ERROR HttpInputDataHandler" sourcetype=splunkd (`splunkadmins_splunkd_source`) `splunkenterprisehosts` \ +| eval event_message=coalesce(event_message,message) \ +| rex field=event_message mode=sed "s/(http_input_body_size=)\d+|(totalRequestSize=)\d+/\1/" \ +| rex field=event_message mode=sed "s/(channel=)[^,]+/channel=/" \ +| bin _time span=4h \ +| fillnull reply \ +| stats count by host, event_message, _time, reply \ +| cluster t=0.999 field=event_message showcount=true \ +| lookup splunkadmins_hec_reply_code_lookup status_code AS reply \ +| eval count=count+cluster_count \ +| fields - cluster_* \ +| sort _time +disabled = 1 + +[SearchHeadLevel - Lookup updates within SHC] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Excessive CSV lookup updates can trigger extra bundle replication issues to the indexer cluster(s), this report identifies the lookups with the largest number of updates +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Excessive CSV lookup updates can trigger extra bundle replication issues to the indexer cluster(s), this report identifies the lookups with the largest number of updates```\ +index=_internal `searchheadhosts` source=*conf.log sourcetype=splunkd_conf lookups "data.asset_uri{}"=lookups "data.optype_desc"=NOTIFY_UPDATE_LOOKUP ```data.task=acceptPush also appears to work...``` data.task=addCommit \ +| rename "data.asset_uri{}" AS asset\ +| eval lookup_user=mvindex(asset, 0), lookup_app=mvindex(asset, 1), lookup_file=mvindex(asset, 3)\ +| stats min(_time) AS firstSeen, max(_time) AS lastSeen, count by lookup_file\ +| eval frequencyOfUpdatesInMins = round(((lastSeen-firstSeen)/60)/count)\ +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+")\ +| sort - count + +[IndexerLevel - Knowledge bundle upload stats] +action.email.useNSSubject = 1 +alert.track = 0 +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.show = 0 +description = Report only? Yes. Attempt to query the indexing tier to determine how often they are receiving new knowledge bundles from the various search tiers. From here calculate the time period between uploads, how long it takes and how many bundles during the time period +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Query the indexing tier to determine how often they are receiving new knowledge bundles from the various search tiers. From here calculate the time period between uploads, how long it takes and how many bundles during the time period\ + Alternatives would be to check the bundle uploads via: 'index=_internal sourcetype=splunkd source=*metrics.log* group=bundles_uploads' or the bundle downloads with group=bundles_downloads, in particular the baseline_count/delta_count appear to show the different methods used but it appears to be more accurate to check the splunkd access logs on the indexing tier itself in 7.0.x. In 8.0.x the cascading bundle adds various complications, if no cascading bundle is in use you can drop the appends completely```\ + index=_internal `indexerhosts` sourcetype=splunkd_access source=*splunkd_access.log (/services/receivers/bundle OR /services/replication/cascading/upload/payload) method=POST \ +| rex field=uri "/services/receivers/(?P<type>[^/]+)/(?P<guid>[^/]+)" \ +| rex field=uri "/cascading/upload/payload/(?P<planid>[^/]+)$" \ +| append \ + [ search index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM("group=cascading") TERM("name=per_peer_replication") OR TERM("name=plan_metadata") \ + | rex "https://(?P<indexer_ip>[^:]+)" \ + | lookup dnslookup clientip AS indexer_ip OUTPUT clienthost AS indexer \ + | eventstats values(endpoint) AS type, values(init_server) AS guid by planid \ + | stats count, values(guid) AS guid, values(type) AS type, latest(_time) AS _time by planid, indexer \ + | eval replication_mode="cascading" \ + | rename indexer AS host ] \ +| eval planid=upper(planid)\ +| eventstats values(guid) AS guid, values(type) AS type by planid\ +| sort 0 _time\ +| eval type=case(type=="delta_bundle","delta-bundle",type=="full_bundle","full-bundle",type=="bundle-delta","delta-bundle",type=="bundle","full-bundle",1=1,type)\ +| eval guid=if(match(guid,"\."),guid,upper(guid)) \ +| streamstats global=false window=1 current=f last(_time) AS lastBundle by host, guid, type \ +| eval delta = if(isnotnull(lastBundle), _time - lastBundle,null()) \ +| fillnull delta value="N/A" \ +| eventstats count AS bundleUploadCount by host, guid, type \ +| append \ + [ search index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM("group=bundle_replication") \ + | rex field=bundle_id "^(?P<guid>.*?)-\d+$" \ + | stats earliest(_time) AS earliesttime, latest(_time) AS mostrecenttime, values(guid) AS guid, max(apply_time_msec) AS max_apply_time_msec, avg(apply_time_msec) AS avg_apply_time_msec by bundle_id, bundle_type \ + | eval deploy_time=mostrecenttime-earliesttime \ + | rename bundle_type AS type \ + | fields guid, type, deploy_time, max_apply_time_msec, avg_apply_time_msec, replication_mode ] \ + ```The metrics.log uses delta_bundle, the other logs use bundle-delta or a variation of it``` \ +| eval type=case(type=="delta_bundle","delta-bundle",type=="full_bundle","full-bundle",type=="bundle-delta","delta-bundle",type=="bundle","full-bundle",1=1,type) \ +| eval guid=if(match(guid,"\."),guid,upper(guid)) \ +| stats latest(_time) AS mostRecent, max(bundleUploadCount) AS bundleUploadsInTimePeriod, max(delta) AS largestTimeDeltaInSeconds, min(delta) AS minTimeDeltaInSeconds, avg(avg_apply_time_msec) AS avgApplySeconds, max(max_apply_time_msec) AS maxApplySeconds, min(deploy_time) AS min_deploy_time, max(deploy_time) AS max_deploy_time, avg(deploy_time) AS avg_deploy_time, values(replication_mode) AS replication_mode by guid, type \ +| fillnull replication_mode value="classic" \ +| eval mostRecent=strftime(mostRecent, "%+"), avgSeconds = round(avgSeconds,3), avgApplySeconds=round(avgApplySeconds/1000,3), maxApplySeconds=round(maxApplySeconds/1000,3) \ +| addinfo \ +| eval minsBetweenUploads=round(((info_max_time-info_min_time)/60) / bundleUploadsInTimePeriod) \ +| table guid, type, mostRecent, bundleUploadsInTimePeriod, minsBetweenUploads, minTimeDeltaInSeconds, largestTimeDeltaInSeconds, min_deploy_time, max_deploy_time, avg_deploy_time, avgApplySeconds, maxApplySeconds, replication_mode + +[SearchHeadLevel - IndexesPerRole Remote Report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Attempt to query a remote Splunk instance to find a list of accessible indexes per role within Splunk, relies on 2 other reports for the map commands, requires the "SearchHeadLevel - Index list report" report to be run to populate the lookup file splunkadmins_indexlist +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["host","index","linecount","source","sourcetype","splunk_server"] +display.general.type = statistics +display.page.search.patterns.sensitivity = 0.866 +display.page.search.tab = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = search ```remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command...<begin> | curl ``` method=get uri="$url$/services/authorization/roles?output_mode=json&count=0&f=srchIndexesAllowed&f=srchIndexesDefault&f=imported_srchIndexesAllowed&f=imported_srchIndexesDefault" user="$user$" pass="$pass$" \ +| rex field=curl_message max_match=10000 "{\"name\":\"(?P<role>[^\"]+)\".*?\"imported_srchIndexesAllowed\":(?P<imported_srchIndexesAllowed>\[[^\]]*\]),\"imported_srchIndexesDefault\":(?P<imported_srchIndexesDefault>\[[^\]]*\]),\"srchIndexesAllowed\":(?P<srchIndexesAllowed>\[[^\]]*\]),\"srchIndexesDefault\":(?P<srchIndexesDefault>\[[^\]]*\])" \ +| fields - curl_* \ +| eval srchIndexesAllowed=mvzip(srchIndexesAllowed,imported_srchIndexesAllowed) \ +| eval srchIndexesDefault=mvzip(srchIndexesDefault,imported_srchIndexesDefault) \ +| eval data=mvzip(role,mvzip(srchIndexesDefault,srchIndexesAllowed,"%%%%"),"%%%%") \ +| fields data \ +| mvexpand data \ +| makemv delim="%%%%" data \ +| eval roles=mvindex(data,0), srchIndexesDefault=mvindex(data,1), srchIndexesAllowed=mvindex(data,2) \ +| fields - data \ +| eval srchIndexesDefault=replace(srchIndexesDefault,"(\[\],|,\[\]|\"|\[|\])","") \ +| eval srchIndexesAllowed=replace(srchIndexesAllowed,"(\[\],|,\[\]|\"|\[|\])","") \ +| makemv srchIndexesAllowed delim="," \ +| makemv srchIndexesDefault delim="," \ +| eval srchIndexesAllowed=if(mvcount(srchIndexesAllowed)==0 OR isnull(srchIndexesAllowed),"requiredformvexpand",srchIndexesAllowed), srchIndexesDefault=if(mvcount(srchIndexesDefault)==0 OR isnull(srchIndexesDefault),"requiredformvexpand",srchIndexesDefault) \ +| mvexpand srchIndexesAllowed \ +| eval srchIndexesAllowed=if(srchIndexesAllowed=="requiredformvexpand",null(),srchIndexesAllowed) \ +| eval srchIndexesAllowed=lower(srchIndexesAllowed) \ +| fields srchIndexesAllowed, srchIndexesDefault, roles \ +| append [ | makeresults | eval srchIndexesAllowed="NA", srchIndexesDefault="NA", roles="novalidroles", splunk_server="default" | fields - _time ]\ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesallowed Report" maxsearches=15000 \ +| stats values(index) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles, splunk_server \ +| eval srchIndexesDefault=replace(srchIndexesDefault,","," ") \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| mvexpand srchIndexesDefault \ +| append [ | makeresults | eval srchIndexesAllowed="NA", srchIndexesDefault="NA", roles="novalidroles", splunk_server="default" | fields - _time ]\ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesdefault Report" maxsearches=15000 \ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(index) AS srchIndexesDefault by roles \ +| where roles!="novalidroles"\ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| eval srchIndexesDefault = if(srchIndexesDefault=="requiredformvexpand",null(),srchIndexesDefault) + +[SearchHeadLevel - IndexesPerRole srchIndexesallowed Report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Designed to be run by a map / 1 event, run a lookup against the srchIndexesAllowed and a regex match to see if the index name in the event matches the srchIndexesAllowed..., requires the splunkadmins_indexlist lookup file +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["host","index","linecount","source","sourcetype","splunk_server"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | inputlookup splunkadmins_indexlist where index="$srchIndexesAllowed$" AND index!="requiredformvexpand"\ + | eval regex="^" . "$srchIndexesAllowed$" . "$" \ + | eval regex=replace(regex,"\*",".*") \ + | eval regex=if(substr(regex,1,3)=="^.*","^[^_].*" . substr(regex,4),regex) \ + | where match(index,regex) \ + | eval srchIndexesAllowed="$srchIndexesAllowed$", srchIndexesDefault="$srchIndexesDefault$", roles="$roles$", splunk_server="$splunk_server$"\ + | fields index, roles, srchIndexesAllowed, srchIndexesDefault, splunk_server + +[SearchHeadLevel - IndexesPerRole srchIndexesdefault Report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Designed to be run by a map / 1 event, run a lookup against the srchIndexesDefault and a regex match to see if the index name in the event matches the srchIndexesDefault..., requires the splunkadmins_indexlist lookup file +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["host","index","linecount","source","sourcetype","splunk_server"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | inputlookup splunkadmins_indexlist where index="$srchIndexesDefault$"\ + | eval regex="^" . "$srchIndexesDefault$" . "$" \ + | eval regex=replace(regex,"\*",".*") \ + | eval regex=if(substr(regex,1,3)=="^.*","^[^_].*" . substr(regex,4),regex) \ + | where match(index,regex) \ + | eval srchIndexesAllowed="$srchIndexesAllowed$", srchIndexesDefault="$srchIndexesDefault$", roles="$roles$", splunk_server="$splunk_server$"\ + | fields index, roles, srchIndexesAllowed, srchIndexesDefault, splunk_server + +[SearchHeadLevel - IndexesPerRole Report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Query the local Splunk instance to determine the indexes available per role, requires the splunkadmins_indexlist lookup file +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["host","index","linecount","source","sourcetype","splunk_server"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/authorization/roles splunk_server="local" \ +| eval comment="This search aims to provide a giant list of users and what indexes they have access to (as in a list of index names, not a list of wildcards). Due to mvexpand hitting memory limits in the environment this alternative version runs many subsearches that do not hit the memory limits" \ +| table title, srchIndexesAllowed, srchIndexesDefault, imported_srchIndexesAllowed, imported_srchIndexesDefault \ +| rename title as roles \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| makemv imported_srchIndexesAllowed tokenizer=(\S+) \ +| makemv imported_srchIndexesDefault tokenizer=(\S+) \ +| eval srchIndexesAllowed = mvappend(srchIndexesAllowed, imported_srchIndexesAllowed) \ +| eval srchIndexesDefault = mvappend(srchIndexesDefault, imported_srchIndexesDefault) \ +| eval splunk_server="default"\ +| eval srchIndexesAllowed=if(mvcount(srchIndexesAllowed)==0 OR isnull(srchIndexesAllowed),"requiredformvexpand",srchIndexesAllowed), srchIndexesDefault=if(mvcount(srchIndexesDefault)==0 OR isnull(srchIndexesDefault),"requiredformvexpand",srchIndexesDefault)\ +| mvexpand srchIndexesAllowed \ +| eval srchIndexesAllowed=if(srchIndexesAllowed=="requiredformvexpand",null(),srchIndexesAllowed) \ +| eval srchIndexesAllowed=lower(srchIndexesAllowed) \ +| fields srchIndexesAllowed, srchIndexesDefault, roles, splunk_server\ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesallowed Report" maxsearches=15000 \ +| stats values(index) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles, splunk_server\ +| eval srchIndexesDefault=replace(srchIndexesDefault,","," ")\ +| makemv srchIndexesDefault tokenizer=(\S+)\ +| mvexpand srchIndexesDefault \ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesdefault Report" maxsearches=15000 \ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(index) AS srchIndexesDefault by roles, splunk_server \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| eval srchIndexesDefault = if(srchIndexesDefault=="requiredformvexpand",null(),srchIndexesDefault)\ +```You can add this in if you have mutiple search head clusters or search heads ...| append [ savedsearch "SearchHeadLevel - IndexesPerRole Remote Report" url="..." user="..." pass="..." | eval splunk_server="..." ]```\ +| outputlookup splunkadmins_indexes_per_role + +[SearchHeadLevel - Search Queries summary exact match] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report is an attempt to use the Splunk audit logs to generate summary statistics on what indexes were accessed and the period of time they were accessed over. There is a lot of complexity here as the audit logs make this task very challenging. This version relates to entries where index=<indexname> where used without wildcards, an additional report "SearchHeadLevel - Search Queries summary non-exact match" also exists to perform this same function without an index specified or when wildcards are used. This report requires "SearchHeadLevel - Index access list by user" and "SearchHeadLevel - Macro report". Also note that you need to remove the comment around the lookup command within the search...this report works in Splunk 8.0 or newer (or 7.3 with some changes). Requires the splunkadmins_macros lookup file to exist, the datamodels, eventtypes and tags lookup files should also exit for this to be accurate. Finally, you may wish to try the report "IndexerLevel - RemoteSearches Indexes Stats" this uses the remote_searches.log and doesn't need to work with macros or similar as it runs on the indexing tier...Note pre Splunk 8.0 you will need to replace splunkadmins_audit_logs_macro_sub_v8 with splunkadmins_audit_logs_macro_sub. If you would prefer an alternative without extremely complex Splunk seraches refer to Sideview UI / https://apps.splunk.com/app/6449/ which has custom commands to do this work. Or use the remote seraches in this app which provide most of this data (although you cannot determine username so less context in remote searches). Finally, you may wish to take a look at https://github.com/TheWoodRanger/presentation-conf_24_audittrail_native_telemetry for a summary of options. +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | multisearch \ + [ search ```Last modified 2024-08-18 Attempt to extract out which indexes are accessed per search query by any search and compute statistics on them. The multisearch is only required if you want to capture sub-searches from join, append or similar, these require a bit more work so that's why the multisearch is there, in fact anything containing one of those keywords is dealt with in the second search, not this one... \ + Note that the regexes need more work, for now, limits.conf [rex] match_limit = 1000000 is my workaround (main issue is the union/set/multisearch rex)``` \ + index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" search_id!="'ta_*" search_id!="'RemoteStorageRetrieveIndexes*" scan_count>0 \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ```Removed due to excess matching, modern splunk versions appear to match search= more accurately | rex \"(?s)^(?:[^'\n]*'){4},\s+\w+='(?P<search>[\s\S]+)'\]($|\[[^\]]+\]$)\"``` \ + | rex field=search mode=sed "s/\n/ /g" \ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search) \ + | eval search_id=replace(search_id,"'","") \ + | `search_type_from_sid(search_id)` \ + | `base64decode(base64appname)` \ + | eval app3="N/A" \ + | eval app_name=coalesce(app,base64appname,app3) \ + | fillnull app_name value="*" \ + | eval splunk_server = `splunkadmins_splunk_server_name` \ + ```Replace macros, but then replace datamodels, then tags, then eventtypes, but what if the eventtype refers to an eventtype? Or tag? Or more macros? This isn't perfect so just substitute a hope for the best. IndexerLevel - RemoteSearches Indexes Stats doesn't have all these issues so it may be safer to see what happens at indexing tier...Note pre Splunk 8.0 you will need to replace splunkadmins_audit_logs_macro_sub_v8 with splunkadmins_audit_logs_macro_sub``` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | regex search="^\s*(\|?)\s*(search|tstats|mstats|mcatalog|multisearch|union|set|summarize|datamodel|from\s*:?\s*datamodel|datamodelsimple)\s+" \ + | regex search!="\|\s*(append|union|multisearch|set|appendcols|appendpipe|join|map)" \ + | `splunkadmins_audit_logs_datamodel_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | rex field=search "(?s)^(?P<prepipe>\s*\|?([^\|]+))" ] \ + [ search ```Attempt to extract out which indexes are accessed per search query by any search and compute statistics on them. This search works on searches with an append/multisearch or other command that has a slightly different regex requirement. Note had to nomv the multivalued field before concatenation or it sliently disappeared!``` \ + index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" search_id!="'ta_*" search_id!="'RemoteStorageRetrieveIndexes*" scan_count>0 \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ```Removed due to excess matching, modern splunk versions appear to match search= more accurately | rex \"(?s)^(?:[^'\n]*'){4},\s+\w+='(?P<search>[\s\S]+)'\]($|\[[^\]]+\]$)\"``` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search) \ + | eval search_id=replace(search_id,"'","") \ + | `search_type_from_sid(search_id)` \ + | `base64decode(base64appname)` \ + | eval app3="N/A" \ + | eval app_name=coalesce(app,base64appname,app3) \ + | fillnull app_name value="*" \ + | eval splunk_server = `splunkadmins_splunk_server_name` \ + ```Replace macros, but then replace datamodels, then tags, then eventtypes, but what if the eventtype refers to an eventtype? Or tag? Or more macros? This isn't perfect so just substitute a hope for the best. IndexerLevel - RemoteSearches Indexes Stats doesn't have all these issues so it may be safer to see what happens at indexing tier...``` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | regex search="\|\s*(append|union|multisearch|set|appendcols|appendpipe|join|map)" \ + | eval len=len(search) \ + ``` we're likely to fail if the search is >50K characters, give up on that search result and move on ```\ + | where len<50000 \ + | `splunkadmins_audit_logs_datamodel_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | rex field=search max_match=50 "(?s)\|?\s*(append|appendcols|appendpipe|map|union)\s+\[(?P<subsearch>.*?)\]\s*(\||$)" \ + | rex field=search max_match=50 "(?s)\|?\s*(join)\s+.*?\[(?P<subsearch>.*?)\]\s*(\||$)" \ + | rex field=search max_match=50 "(?s)\|?\s*(union|set|multisearch)\s+(?P<part1>\[.*?\](\s*\[.*?\])+\s*(`[^`]+`\s*)*(\||$|',\s+))" \ + | rex field=part1 max_match=50 "(?s).*?\[(?P<subsearch>.*?)\]\s*(\||$|)" \ + | rex field=search max_match=50 "(?s)\|?\s*(map)\s+(maxsearches\s*=\s*\d+)?\s*search\s*=\s*\"(?P<subsearch>.*?)\"\s*(\||$)" \ + | rex field=search "^(?P<prepipe>\s*\|?([^\|]+))" \ + | rex field=subsearch "(?s)^\s*\|?(?P<prepipe_subsearch>([^\|]+))" \ + | nomv prepipe_subsearch \ + | eval prepipe = prepipe . " " . prepipe_subsearch \ + ] \ +| eval search=prepipe \ + ```The (index=* OR index=_*) index=<specific index> is a common use case for enterprise security, also some individuals like doing a similar trick so remove the index=*... as this is not a wildcard index search``` \ +| rex field=search "(?P<esstylewildcard>\(\s*index=\*\s+OR\s+index=_\*\s*\))" \ +| rex mode=sed field=search "s/search index=\s*\S+\s+index\s*=/search index=/g" \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| rex ", savedsearch_name=\"(?P<savedsearch_name>[^\"]+)\","\ +| stats values(total_run_time) AS total_run_time, values(event_count) AS event_count, values(scan_count) AS scan_count, values(search) AS search, values(search_et) AS search_et, values(search_lt) AS search_lt, values(savedsearch_name) AS savedsearch_name, min(_time) AS timestamp, values(search_head_cluster) AS search_head_cluster, max(duration_command_search_index) AS duration_index, max(duration_command_search_rawdata) AS duration_rawdata, max(invocations_command_search_index_bucketcache_hit) AS cache_index_hits, max(invocations_command_search_index_bucketcache_miss) AS cache_index_miss, max(duration_command_search_index_bucketcache_hit) AS cache_index_hit_duration, max(duration_command_search_index_bucketcache_miss) AS cache_index_miss_duration, max(invocations_command_search_rawdata_bucketcache_hit) AS cache_rawdata_hits, max(invocations_command_search_rawdata_bucketcache_miss) AS cache_rawdata_miss, max(duration_command_search_rawdata_bucketcache_hit) AS cache_rawdata_hit_duration, max(duration_command_search_rawdata_bucketcache_miss) AS cache_rawdata_miss_duration, values(esstylewildcard) AS esstylewildcard, values(provenance) AS provenance by user, type, search_id, app_name \ + ```We now deal with cases where search earliest/latest times were not specified, assume all time is about 1 year in the past and latest time was the search run time``` \ +| eval search_lt=if(search_lt=="N/A",timestamp,search_lt), search_et=if(search_et=="N/A",now()-(365*24*60*60),search_et) \ + ```Extract out index= or index IN (a,b,c) but avoid NOT index in (...) and NOT index=... and also NOT (...anything) statements``` \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| makemv delim=" " indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval wildcard=mvfilter(match(indexes,"\*")) \ +| where isnull(wildcard) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| eval multi=if(mvcount(indexes)>1,"true","false") \ +| stats values(timestamp) AS _time, values(total_run_time) AS total_run_time, values(event_count) AS event_count, values(scan_count) AS scan_count, values(search_et) AS search_et, values(search_lt) AS search_lt, values(savedsearch_name) AS savedsearch_name, values(multi) AS multi, max(duration_index) AS duration_index, max(duration_rawdata) AS duration_rawdata, max(cache_index_hits) AS cache_index_hits, max(cache_index_miss) AS cache_index_miss, max(cache_index_hit_duration) AS cache_index_hit_duration, max(cache_index_miss_duration) AS cache_index_miss_duration, max(cache_rawdata_hits) AS cache_rawdata_hits, max(cache_rawdata_miss) AS cache_rawdata_miss, max(cache_rawdata_hit_duration) AS cache_rawdata_hit_duration, max(cache_rawdata_miss_duration) AS cache_rawdata_miss_duration, values(provenance) AS provenance by user, type, indexes, search_head_cluster, search_id, app_name \ +| eval period=search_lt-search_et \ +| fields - indexin, indexregex \ + ```Commands like multikv result in giant event count numbers compared to scan count, lower the lispy back down to normal to prevent the stats from been broken. lispy efficiency as per Martin Muller's conf presentations``` \ +| eval lispy_efficiency = if(event_count>scan_count,scan_count,event_count) / scan_count + +[SearchHeadLevel - Search Queries summary non-exact match] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report is an attempt to use the Splunk audit logs to generate summary statistics on what indexes were accessed and the period of time they were accessed over. There is a lot of complexity here as the audit logs make this task very challenging. This version relates to entries where either index names are specified with wildcards or no index is specified, an additional report "SearchHeadLevel - Search Queries summary exact match" also exists to perform this same function where an index=<indexname> is specified. This report requires "SearchHeadLevel - Index access list by user" and "SearchHeadLevel - Macro report". Also note that you need to remove the comment around the lookup within the search...this report works on Splunk 8.0 or newer or 7.3 with some modification. Requires the splunkadmins_macros and splunkadmins_indexes_per_role lookup files to exist. Note pre Splunk 8.0 you will need to replace splunkadmins_audit_logs_macro_sub_v8 with splunkadmins_audit_logs_macro_sub. Note that this search utilises the streamfilterwildcard custom search command included in the TA-Alerts for SplunkAdmins application on SplunkBase (or github). The Sideview UI / https://apps.splunk.com/app/6449/ app offers an alternative way to read the audit log files with custom commands instead. The RemoteSearches examples logs also have the majority of this data but lack context such as the username available in the audit.log files. Finally, you may wish to take a look at https://github.com/TheWoodRanger/presentation-conf_24_audittrail_native_telemetry for a summary of options. +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source","indextime","count"] +display.general.type = statistics +display.page.search.tab = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | multisearch \ + [ search ```Last modified 2022-02-14 Attempt to extract out which indexes are accessed per search query by any search and compute statistics on them. The multisearch is only required if you want to capture sub-searches from join, append or similar, these require a bit more work so that's why the multisearch is there, in fact anything containing one of those keywords is dealt with in the second search, not this one... \ + Note that the regexes need more work, for now, limits.conf [rex] match_limit = 1000000 is my workaround (main issue is the union/set/multisearch rex)``` \ + index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" search_id!="'ta_*" search_id!="'RemoteStorageRetrieveIndexes*" scan_count>0 \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ```Removed due to excess matching, modern splunk versions appear to match search= more accurately | rex \"(?s)^(?:[^'\n]*'){4},\s+\w+='(?P<search>[\s\S]+)'\]($|\[[^\]]+\]$)\"``` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search) \ + | eval search_id=replace(search_id,"'","") \ + | `search_type_from_sid(search_id)` \ + | `base64decode(base64appname)` \ + | eval app3="N/A" \ + | eval app_name=coalesce(app,base64appname,app3) \ + | fillnull app_name value="*" \ + | eval splunk_server = `splunkadmins_splunk_server_name` \ + ```Replace macros, but then replace datamodels, then tags, then eventtypes, but what if the eventtype refers to an eventtype? Or tag? Or more macros? This isn't perfect so just substitute a hope for the best. IndexerLevel - RemoteSearches Indexes Stats doesn't have all these issues so it may be safer to see what happens at indexing tier...``` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | regex search="^\s*(\|?)\s*(search|tstats|mstats|mcatalog|multisearch|union|set|summarize|datamodel|from\s*:?\s*datamodel|datamodelsimple)\s+" \ + | regex search!="\|\s*(append|union|multisearch|set|appendcols|appendpipe|join|map)" \ + | `splunkadmins_audit_logs_datamodel_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | rex field=search "(?s)^(?P<prepipe>\s*\|?([^\|]+))" ] \ + [ search ```Attempt to extract out which indexes are accessed per search query by any search and compute statistics on them. This search works on searches with an append/multisearch or other command that has a slightly different regex requirement. Note had to nomv the multivalued field before concatenation or it sliently disappeared!``` \ + index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" search_id!="'ta_*" search_id!="'RemoteStorageRetrieveIndexes*" scan_count>0 \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ```Removed due to excess matching, modern splunk versions appear to match search= more accurately | rex \"(?s)^(?:[^'\n]*'){4},\s+\w+='(?P<search>[\s\S]+)'\]($|\[[^\]]+\]$)\"``` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search) \ + | eval search_id=replace(search_id,"'","") \ + | `search_type_from_sid(search_id)` \ + | `base64decode(base64appname)` \ + | eval app3="N/A" \ + | eval app_name=coalesce(app,base64appname,app3) \ + | eval splunk_server = `splunkadmins_splunk_server_name` \ + | fillnull app_name value="*" \ + ```Replace macros, but then replace datamodels, then tags, then eventtypes, but what if the eventtype refers to an eventtype? Or tag? Or more macros? This isn't perfect so just substitute a hope for the best. IndexerLevel - RemoteSearches Indexes Stats doesn't have all these issues so it may be safer to see what happens at indexing tier...``` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | regex search="\|\s*(append|union|multisearch|set|appendcols|appendpipe|join|map)" \ + | eval len=len(search) \ + ``` we're likely to fail if the search is >50K characters, give up on that search result and move on ```\ + | where len<50000 \ + | `splunkadmins_audit_logs_datamodel_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_eventtypes_sub` \ + | `splunkadmins_audit_logs_tags_sub` \ + | `splunkadmins_macro_sub("search")` \ + | `splunkadmins_macro_sub("search")` \ + | rex field=search mode=sed "s/\n/ /g"\ + | rex field=search mode=sed "s/```.*?```/ /g" \ + | rex field=search max_match=50 "(?s)\|?\s*(append|appendcols|appendpipe|map|union)\s+\[(?P<subsearch>.*?)\]\s*(\||$)" \ + | rex field=search max_match=50 "(?s)\|?\s*(join)\s+.*?\[(?P<subsearch>.*?)\]\s*(\||$)" \ + | rex field=search max_match=50 "(?s)\|?\s*(union|set|multisearch)\s+(?P<part1>\[.*?\](\s*\[.*?\])+\s*(`[^`]+`\s*)*(\||$|',\s+))" \ + | rex field=part1 max_match=50 "(?s).*?\[(?P<subsearch>.*?)\]\s*(\||$|)" \ + | rex field=search max_match=50 "(?s)\|?\s*(map)\s+(maxsearches\s*=\s*\d+)?\s*search\s*=\s*\"(?P<subsearch>.*?)\"\s*(\||$)" \ + | rex field=search "^(?P<prepipe>\s*\|?([^\|]+))" \ + | rex field=subsearch "(?s)^\s*\|?(?P<prepipe_subsearch>([^\|]+))" \ + | nomv prepipe_subsearch \ + | eval prepipe = prepipe . " " . prepipe_subsearch ] \ +| eval search=prepipe \ + ```The (index=* OR index=_*) index=<specific index> is a common use case for enterprise security, also some individuals like doing a similar trick so remove the index=*... as this is not a wildcard index search``` \ +| rex field=search "(?P<esstylewildcard>\(\s*index=\*\s+OR\s+index=_\*\s*\))" \ +| rex mode=sed field=search "s/search index=\s*\S+\s+index\s*=/search index=/g" \ + ```Extract out index= or index IN (a,b,c) but avoid NOT index in (...) and NOT index=... and also NOT (...anything) statements``` \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| makemv delim=" " indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval wildcard=mvfilter(match(indexes,"\*")) \ +| where isnotnull(wildcard) OR isnull(indexes) \ +| eval short=mvmap(indexes,if(len(indexes)<=3,"True",null())) \ +| eval short=if(isnull(short),"False","True") \ + ```We now deal with cases where search earliest/latest times were not specified, assume all time is about 1 year in the past and latest time was the search run time``` \ +| eval search_lt=if(search_lt=="N/A",_time,search_lt), search_et=if(search_et=="N/A",now()-(365*24*60*60),search_et) \ +| eval period=search_lt-search_et \ + ```Now that we have a giant list of indexes, we want to strip any quote characters and lowercase them in case we use a kvstore for lookups or similar. \ + Run a lookup to find the default indexes and the allowed indexes per user``` \ +| eval roles=replace(roles,"'","") \ +| makemv roles delim="+" \ +| lookup splunkadmins_indexes_per_role roles, splunk_server \ +| rex ", savedsearch_name=\"(?P<savedsearch_name>[^\"]+)\","\ +| fields indexes, user, period, total_run_time, event_count, scan_count, srchIndexesAllowed, srchIndexesDefault, search_id, search_et, search_lt, host, app_name, savedsearch_name, type, duration_command_search_index, duration_command_search_rawdata, invocations_command_search_index_bucketcache_hit, invocations_command_search_index_bucketcache_miss, duration_command_search_index_bucketcache_hit, duration_command_search_index_bucketcache_miss, invocations_command_search_rawdata_bucketcache_hit, invocations_command_search_rawdata_bucketcache_miss, duration_command_search_rawdata_bucketcache_hit, duration_command_search_rawdata_bucketcache_miss, short, _time, provenance \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| streamfilterwildcard pattern=indexes fieldname=indexes srchIndexesAllowed \ +| eval indexes=if(isnull(indexes),srchIndexesDefault,indexes) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| makemv indexes tokenizer=(\S+) \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| eval indexes=mvdedup(indexes) \ +| eval multi=if(mvcount(indexes)>1,"true","false") \ +| stats values(_time) AS _time, values(total_run_time) AS total_run_time, values(event_count) AS event_count, values(scan_count) AS scan_count, values(search_et) AS search_et, values(search_lt) AS search_lt, values(savedsearch_name) AS savedsearch_name, values(multi) AS multi, max(duration_command_search_index) AS duration_index, max(duration_command_search_rawdata) AS duration_rawdata, max(invocations_command_search_index_bucketcache_hit) AS cache_index_hits, max(invocations_command_search_index_bucketcache_miss) AS cache_index_miss, max(duration_command_search_index_bucketcache_hit) AS cache_index_hit_duration, max(duration_command_search_index_bucketcache_miss) AS cache_index_miss_duration, max(invocations_command_search_rawdata_bucketcache_hit) AS cache_rawdata_hits, max(invocations_command_search_rawdata_bucketcache_miss) AS cache_rawdata_miss, max(duration_command_search_rawdata_bucketcache_hit) AS cache_rawdata_hit_duration, max(duration_command_search_rawdata_bucketcache_miss) AS cache_rawdata_miss_duration, values(provenance) AS provenance by user, type, search_id, indexes, search_head_cluster, app_name, short \ +| eval period=search_lt-search_et \ + ```Commands like multikv result in giant event count numbers compared to scan count, lower the lispy back down to normal to prevent the stats from been broken. lispy efficiency as per Martin Muller's conf presentations``` \ +| eval lispy_efficiency = if(event_count>scan_count,scan_count,event_count) / scan_count + +[SearchHeadLevel - Search Queries summary exact match by user] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report uses the "SearchHeadLevel - Search Queries summary exact match" report and then reports per user +dispatch.earliest_time = -4h@m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | savedsearch "SearchHeadLevel - Search Queries summary exact match"\ + ```TODO breakdown the counts into time periods, perhaps 1hour, 4 hours, 8 hours, 24 hours, 3 days, 7 days, 14 days, 21 days, 30 days, 2 months, 3 months, 6months, > 60 months? or similar. Do a count for each one as that would be useful for a dashboard...``` \ +| stats count, dc(indexes) AS indexes, max(period) AS maxPeriod, avg(period) AS avgPeriod, median(period) AS medianPeriod, \ + avg(total_run_time) AS avg_total_run_time, max(total_run_time) AS max_total_run_time, median(total_run_time) AS median_total_run_time, avg(lispy_efficiency) AS avg_lispy_efficiency, max(lispy_efficiency) AS max_lispy_efficiency, min(lispy_efficiency) AS min_lispy_efficiency, median(lispy_efficiency) AS median_lispy_efficiency by user\ +| fillnull max_lispy_efficiency, min_lispy_efficiency, median_lispy_efficiency \ +| eval maxPeriod=tostring(maxPeriod,"duration"), avgPeriod=tostring(avgPeriod,"duration"), medianPeriod=tostring(medianPeriod,"duration") + +[SearchHeadLevel - Search Queries summary exact match by index] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report uses the "SearchHeadLevel - Search Queries summary exact match" report and then reports per index +dispatch.earliest_time = -4h@m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | savedsearch "SearchHeadLevel - Search Queries summary exact match"\ +| eval period=case(period<900,"a<=15 minutes",period<3600,"b<=1hour",period<=14400,"c<=4hours",period<=86400,"d<=24hours",period<=604800,"e<=7days",period<=1209600,"f<=14days",period<=2592000,"g<=30days",period<=5184000,"h<=60days",period<=7776000,"i<=90days",period<=15552000,"j<=180days",period>15552000,"k>180days")\ +| stats count by indexes, period\ +| rename indexes AS index\ + ```Temporary hack to make this visualize without errors below...```\ +| eval indexfirstletter=substr(index,0,1)\ +| stats sum(count) AS count by indexfirstletter, period\ +| xyseries period indexfirstletter count + +[SearchHeadLevel - IndexesPerUser Report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report requires the splunkadmins_indexlist lookup, it lists indexes accessible per-user from a local server. Requires the "SearchHeadLevel - Index list report" report to be run to populate the lookup file splunkadmins_indexlist +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/authorization/roles splunk_server="local" \ +| eval comment="This search aims to provide a giant list of users and what indexes they have access to (as in a list of index names, not a list of wildcards). Due to mvexpand hitting memory limits in the environment this alternative version runs many subsearches that do not hit the memory limits" \ +| table title, srchIndexesAllowed, srchIndexesDefault, imported_srchIndexesAllowed, imported_srchIndexesDefault \ +| rename title as roles \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| makemv imported_srchIndexesAllowed tokenizer=(\S+) \ +| makemv imported_srchIndexesDefault tokenizer=(\S+) \ +| eval srchIndexesAllowed = mvappend(srchIndexesAllowed, imported_srchIndexesAllowed) \ +| eval srchIndexesDefault = mvappend(srchIndexesDefault, imported_srchIndexesDefault) \ +| fillnull srchIndexesDefault, srchIndexesAllowed value="requiredformvexpand" \ +| mvexpand srchIndexesAllowed \ +| eval srchIndexesAllowed=if(srchIndexesAllowed=="requiredformvexpand",null(),srchIndexesAllowed) \ +| eval srchIndexesAllowed=lower(srchIndexesAllowed) \ +| fields srchIndexesAllowed, srchIndexesDefault, roles \ +| eval splunk_server="default" \ +| append [ | makeresults | eval srchIndexesAllowed="", srchIndexesDefault="", roles="novalidroles", splunk_server="default" | fields - _time ]\ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesallowed Report" maxsearches=5000 \ +| stats values(index) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles, splunk_server \ +| makemv srchIndexesDefault tokenizer=(\S+) \ +| mvexpand srchIndexesDefault \ +| map \ + "SearchHeadLevel - IndexesPerRole srchIndexesdefault Report" maxsearches=5000 \ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(index) AS srchIndexesDefault by roles, splunk_server \ +| where roles!="novalidroles" \ +| makemv srchIndexesAllowed tokenizer=(\S+) \ +| append \ + [| rest /services/authentication/users f=type f=roles `splunkadmins_restmacro` \ + | table title, roles \ + | rename title AS user \ + | mvexpand roles ] \ +| append \ + [| makeresults \ + | eval user="splunk-system-user", roles="admin" ]\ +| eval srchIndexesDefault = if(srchIndexesDefault=="requiredformvexpand",null(),srchIndexesDefault) \ +| eventstats values(srchIndexesAllowed) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by roles \ +| stats values(srchIndexesAllowed) AS srchIndexesAllowed, values(srchIndexesDefault) AS srchIndexesDefault by user + +[SearchHeadLevel - SHC Captain unable to establish common bundle] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 57 * * * * +description = Chance the alert requires action? Moderate. Failures to establish a common search bundle result in delayed searches, this can occur per-member if distsearch.conf is changed +dispatch.earliest_time = -1h@h +dispatch.latest_time = now +display.events.fields = ["source","sourcetype","host"] +display.visualizations.charting.chart = bar +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to detect bundle issues on any member of the search head cluster for an extended period of time. It can fail per-member if distsearch is customised``` index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "Gave up waiting for the captain to establish a common bundle version" OR "Cannot determine a latest common bundle" OR (log_level=WARN AND component=DistributedBundleReplicationManager ```This is confirmed as an invalid warning message in Splunk 9``` NOT "Failed to touch bundle=, checksum=0 (manual preparation): No such file or directory") \ +| search ```Exclude shutdown times``` NOT [`splunkadmins_shutdown_time(indexerhosts,0,0)`]\ +| timechart span=5m count by host\ +| fillnull\ +| untable _time, host, count\ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, last(count) AS lastCount by host\ +| eval mostRecent=strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+")\ +| where lastCount>0 +disabled = 1 + +[SearchHeadLevel - SHC conf log summary] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report attempts to summarise replication activity within the search head cluster via the conf.log file, the report 'SearchHeadLevel - Lookup updates within SHC' is designed more specifically for lookup updates only +dispatch.earliest_time = -60m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Measure search head clustering writes to particular config objects via the conf.log file``` index=_internal source=*conf.log data.task=addCommit sourcetype=splunkd_conf `searchheadhosts`\ +| spath output=username path=data.asset_uri{0}\ +| spath output=app path=data.asset_uri{1}\ +| spath output=type path=data.asset_uri{2}\ +| spath output=objname path=data.asset_uri{3}\ +| fields objname, type, username, app, data.asset_id, host\ +| stats count by data.asset_id, objname, type, username, app, host\ +| fields - data.asset_id\ +| sort 0 - count\ +| table count, objname, type, app, username, host + +[IndexerLevel - platform_stats.counters hosts] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 45 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to provide a count of number of unique hostnames sending data to Splunk (note realtime_schedule = 0) +dispatch.earliest_time = -5m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats dc(host) AS unique_hosts where index=* earliest=-24h, latest=+5y _index_earliest=-1h \ +| eval indexer_cluster=`indexer_cluster_name`\ +| eval prefix="platform_stats.counters."\ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix indexer_cluster``` + +[IndexerLevel - platform_stats.counters hosts 24hour] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 0 1 * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to provide a daily count of number of unique hostnames sending data to Splunk over a 24 hour period (note realtime_schedule = 0) +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +realtime_schedule = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats dc(host) AS unique_hosts_24hr where index=*, earliest=-48h, latest=+5y _index_earliest=-24h\ +| eval indexer_cluster=`indexer_cluster_name`\ +| eval prefix="platform_stats.counters."\ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix indexer_cluster``` + +[IndexerLevel - platform_stats.indexers totalgb measurement] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure the amount of data going through the platform, (note realtime_schedule = 0) +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +realtime_schedule = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `licensemasterhost` type=usage sourcetype=splunkd source=*license_usage.log*\ + ```Alternative query IndexerLevel - platform_stats.indexers totalgb_thruput measurement if you want the _internal indexes among others. This query uses license usage instead as we are measuring what non-internal data we are indexing```\ +| stats sum(b) AS totalbytes by i \ +| append \ + [| rest /services/search/distributed/peers \ + | fields guid peerName | rename guid AS i ] \ +| eval totalgb=totalbytes/1024/1024/1024\ +| eventstats values(peerName) AS indexer by i\ +| where totalgb>0\ +| stats sum(totalgb) AS totalgb by indexer\ +| eval prefix="platform_stats.indexers."\ +| eval indexer_cluster=`indexer_cluster_name(indexer)` \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix indexer indexer_cluster``` + +[IndexerLevel - platform_stats.indexers totalgb_thruput measurement] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure the amount of data going through the platform, (note realtime_schedule = 0) +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +realtime_schedule = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` TERM(group=thruput) TERM(name=index_thruput) `splunkadmins_metrics_source` sourcetype=splunkd \ +| stats sum(kb) as totalkb by host \ +| eval totalgb_thruput = totalkb/1024/1024 \ +| where totalgb_thruput>0 \ +| eval prefix="platform_stats.indexers." \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_*, totalkb \ +| rename host AS indexer \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix indexer indexer_cluster``` + +[IndexerLevel - platform_stats.indexers stddev measurement] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure the variance between indexer peers, (note realtime_schedule = 0), span based on https://github.com/silkyrich/cluster_health_tools/tree/master/default/data/ui/views/indexer_performance.xml +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +realtime_schedule = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=thruput) TERM(name=thruput) \ +| bin span=62sec _time \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| stats sum(instantaneous_kbps) as instantaneous_kbps by host _time indexer_cluster \ +| stats stdev(instantaneous_kbps) AS stdev_kbps by indexer_cluster, _time \ +| eval prefix="platform_stats.indexers." \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix indexer_cluster``` + +[IndexerLevel - platform_stats.indexers stddev incoming measurement] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure the variance between indexer peers, (note realtime_schedule = 0), from an incoming forwarder point of view by forwarder group +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +realtime_schedule = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=tcpin_connections) \ +| eval name=`forwarder_name(hostname)` \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| bin _time span=1m \ +| stats sum(kb) AS kb by host, indexer_cluster, name, _time \ +| eval kb=kb/60 \ +| stats stdev(kb) AS platform_stats.indexers.deviation_incoming by _time, indexer_cluster, name \ + ```| eval prefix="platform_stats.indexers." | mcollect index=a_metrics_index split=true prefix_field=prefix indexer_cluster, name``` + +[SearchHeadLevel - platform_stats.audit metrics searches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure queries per day from _audit logs (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Count the number of of Splunk queries that use a search command, excludes rest/data model acceleration et cetera``` \ + index=_audit ", info=granted " search_id!="'rsa_*" \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| regex search!="^(\| (copy|archive)buckets|typeahead|\s*archivebuckets)" \ + ```Unsure what this search does but it appears to be running when no reports are accelerated...``` \ +| regex search!="^summarize (tstats=t maintain=\"\" summaryprefix=\"[^\"]+\"|maintain=\"%22SUMMARY_ID%22%2C%22EARLIEST_TIME%22%2C%22REMOTE_SEARCH%22%2C%22NORM_SUMMARY_ID%22%2C%22NORM_REMOTE_SEARCH%22%0A\" summaryprefix=\"[^\"]+\")$" \ +| rex "info=granted , search_id='(?P<search_id>[^']+)" \ +| rex "', savedsearch_name=\"(?P<savedsearch_name>[^\"]*)" \ +| `search_type_from_sid(search_id)` \ + ```Split out the information by system vs non-system users. Adhoc/scheduled/dashboards split (as accurately as possible), furthermore you can trigger ad-hoc searches via scripted inputs which is similar to a scheduled search (but not via the scheduler)``` \ +| eval user=if(user=="admin" OR user=="splunk-system-user","system","other") \ +| stats count AS search_count by host, type, user \ +| eval prefix="platform_stats.audit." \ +| rename host AS search_head \ +| eval search_head_cluster=`search_head_cluster` \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head search_head_cluster type user``` + +[SearchHeadLevel - platform_stats.audit metrics users] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure counts of active REST/UI users via audit and internal indexes (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | multisearch \ + [ search ```Count the number of of Splunk queries that use a search command``` \ + index=_audit ", info=granted " search_id!="'rsa_*" \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ] \ + [ search ```Attempt to find all API calls into Splunk but do not include the API calls triggered by the local system (ignore localhost``` \ + index=_internal sourcetype=splunkd_access source="*splunkd_access.log" "/search/jobs/export" OR ("/search/jobs" OR "/search/v1/jobs" OR "/search/v2/jobs" method=POST) NOT control clientip!=127.0.0.1 status=200 OR status=201 ] \ + ```Split out the information by users vs api users``` \ +| eval from=if(index=="_audit","ui","rest") \ +| stats dc(user) AS active_users by host, from \ +| eval prefix="platform_stats.audit." \ +| rename host AS search_head \ +| eval search_head_cluster=`search_head_cluster` \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head search_head_cluster from``` + +[SearchHeadLevel - platform_stats.audit metrics api] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = */10 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure search requests using the REST API via access logs (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -15m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find all API calls into Splunk but do not include the API calls triggered by the local system (ignore localhost``` \ + index=_internal sourcetype=splunkd_access source="*splunkd_access.log" "/search/jobs/export" OR ("/search/jobs" OR "/search/v1/jobs" OR "/search/v2/jobs" method=POST) NOT control clientip!=127.0.0.1 status=200 OR status=201 \ +| stats count AS api_search_count by host \ +| eval prefix="platform_stats.audit." \ +| rename host AS search_head \ +| eval search_head_cluster=`search_head_cluster` \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head search_head_cluster``` + +[SearchHeadLevel - platform_stats.audit metrics users 24hour] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 10 6 * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure search requests using the REST API via access logs (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -24h@h +dispatch.latest_time = @h +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | multisearch \ + [ search ```Count the number of of Splunk queries that use a search command``` \ + index=_audit ", info=granted " search_id!="'rsa_*" \ + | rex "(?s), search='(?P<search>.*)\]$" \ + ] \ + [ search ```Attempt to find all API calls into Splunk but do not include the API calls triggered by the local system (ignore localhost``` \ + index=_internal sourcetype=splunkd_access source="*splunkd_access.log" "/search/jobs/export" OR ("/search/jobs" OR "/search/v1/jobs" OR "/search/v2/jobs" method=POST) NOT control clientip!=127.0.0.1 status=200 OR status=201 ] \ + ```Split out the information by users vs api users``` \ +| eval from=if(index=="_audit","ui","rest")\ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +| stats dc(user) AS active_users_24hour, dc(host) AS host_count by search_head_cluster, from\ +| eval prefix="platform_stats.audit." \ +| addinfo\ +| rename info_max_time AS _time\ +| fields - info_*\ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head search_head_cluster from``` + +[SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to find user metrics around CPU usage, indexer impact et cetera from the introspection index (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_introspection `indexerhosts` sourcetype=splunk_resource_usage data.search_props.sid::* data.search_props.type!=other\ +| eval mem_used = 'data.mem_used' \ +| eval app = 'data.search_props.app' \ +| eval elapsed = 'data.elapsed' \ +| eval label = 'data.search_props.label' \ +| eval intro_type = 'data.search_props.type' \ +| eval mode = 'data.search_props.mode' \ +| eval user = 'data.search_props.user' \ +| eval cpuperc = 'data.pct_cpu' \ +| eval search_head = 'data.search_props.search_head' \ +| eval read_mb = 'data.read_mb' \ +| eval provenance='data.search_props.provenance' \ +| eval label=coalesce(label, provenance) \ +| eval sid='data.search_props.sid' \ +| rex field=sid "^remote_(?P<search_id_local>.*)" \ +| eval server_with_underscore = search_head . "_" \ +| eval search_id_local=replace(search_id_local, server_with_underscore, "") \ +| eval sid = "'" . sid . "'" \ +| `search_type_from_sid(search_id_local)` \ +| eval type=case(intro_type=="ad-hoc",if(type=="dashboard","dashboard",intro_type),1=1,intro_type) \ +| stats max(elapsed) as runtime max(mem_used) as mem_used, sum(cpuperc) AS totalCPU, avg(cpuperc) AS avgCPU, max(read_mb) AS read_mb, values(sid) AS sids by type, mode, app, user, label, host, search_head, data.pid\ +| eval type=replace(type," ","-")\ +| eval search_head_cluster=`search_head_cluster`\ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| stats dc(sids) AS search_count, sum(totalCPU) AS total_cpu, sum(mem_used) AS total_mem_used, max(runtime) AS max_runtime, avg(runtime) AS avg_runtime, avg(avgCPU) AS avgcpu_per_indexer, sum(read_mb) AS read_mb, values(app) AS app by type, user, search_head_cluster\ +| eval prefix="user_stats.introspection."\ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ +| foreach user_stats.introspection.* [eval <<FIELD>>=round('<<FIELD>>',2)] \ +| fillnull \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head_cluster, type, user, indexer_cluster, app``` + +[SearchHeadLevel - platform_stats access summary] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 15 * * * * +description = Report only? Yes. This report provides information around the Splunk access UI logs such as dashboard, report or loads of various Splunk pages, perfect for summary indexing... +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | multisearch \ + [ search index=_internal `searchheadhosts` sourcetype=splunk_web_access GET "/app/" status=200 \ + | regex _raw="^[^\]]+\] \"GET /[^/]+/app/" \ + | regex _raw!="^[^\]]+\] \"GET /[^/]+/app/([^/ ]+/?)? HTTP" \ + | rex "GET /[^/]+/app/(?<app>[^/ ?]+)/(?P<view>[^/ ?]+)" \ + | eval decoded_uri_query=urldecode(uri_query) \ + | rex field=decoded_uri_query "/saved/searches/(?P<report>[^&]+)" \ + | eval report=urldecode(report), source="splunk_web_access" ] \ + [ search index=_internal `searchheadhosts` method=GET sourcetype=splunkd_ui_access \ + | regex uri="^(/([^/]+/){2}__raw/services/search/jobs\?output_mode=json&id=)|(/([^/]+/){2}__raw/servicesNS/([^/]+/){2}search/jobs/[^\?/]+\?output)" \ + | rex field=uri "id=(?P<sid>[^&]+)" max_match=20 \ + | eval app=null(), report=null(), view=null() \ + | rex field=uri "^/([^/]+/){2}__raw/servicesNS/([^/]+/)(?P<app>[^/]+)/search/jobs/(?P<sid_2>[^\?]+)\?output" \ + | eval sid=coalesce(sid,sid_2), prebintime=_time, source="splunkd_ui_access" \ + | bin _time span=2m] \ + [ search index=_internal `searchheadhosts` method=POST status=201 sourcetype=splunkd_ui_access \ + | regex uri="/saved/searches/[^/]+/dispatch$" \ + | rex field=uri "(/[^/]+){5}/(?P<app>[^/]+)(/saved/searches/(?P<report>[^/]+))?" \ + | eval view="N/A", report=report, source="splunkd_ui_access_dispatch", report=urldecode(report) ] \ + ``` this search captures the REST API hits to the reports/views, it makes the results more noisy and can be excluded if you want to see what users are actually viewing. However if a dashboard loads reports via loadjob, the reports may only appear indirectly in the splunkd access logs. However since we already have __raw this is probably overkill. /notify POSTs appear to be savedsearch completion ``` [ search index=_internal `searchheadhosts` sourcetype=splunkd_access "/data/ui/views/" OR "/saved/searches/" status=200 NOT "/notify" \ + | rex field=uri_path "/servicesNS/[^/]+/(?P<app>[^/]+)/data/ui/views/(?P<view>[^\s\?]+)" \ + | rex field=uri_path "/servicesNS/[^/]+/(?P<app>[^/]+)/saved/searches/(?P<report>[^\s\?/]+)" \ + | eval report=urldecode(report) \ + | eval source="splunkd_access" \ + | where isnotnull(app) ] \ +| fillnull sid, view value="N/A" \ +| eval prebintime=coalesce(prebintime,_time) \ +| stats earliest(prebintime) AS prebintime, max(spent) AS spent, values(app) AS app, values(report) AS report, values(source) AS source by sid, _time, user, useragent, host, view, env \ +| rex field=sid "(rt_)?(subsearch_)*(?P<from>[^_]+)((_(?P<base64username>[^_]+))|(__(?P<username>[^_]+)))((__(?P<app2>[^_]+)__(?P<report2>[^_]+))|(_(?P<base64appname>[^_]+)__(?P<report3>[^_]+)))" \ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval report=coalesce(report,report2,report3), app=coalesce(app,app2,base64appname,app3) \ +| eval _time=prebintime \ +| eval comment="RMD appears to be an encoded report name that only appears in audit.log, scheduler.log and sometimes remote_searches.log. Not creating yet another lookup for this..." \ +| eval report=if(match(report,"^RMD"),"N/A",report) \ +| eval report=case(from=="scheduler",report . "_scheduler",isnotnull(report3) OR isnotnull(report2),report . "_dashboard",match(sid,"^\d+\."),"N/A_adhoc",1=1,report) \ +| fillnull value="N/A" view, report \ +| stats count by _time, app, view, report, spent, user, useragent, host, env, source \ +| table _time, app, view, report, spent, user, useragent, sourceHost, env, source, count + +[SearchHeadLevel - platform_stats.remote_searches metrics populating search] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to find stats via the remote_searches.log on the indexing tier (useful if you do not have audit logs for all search heads) (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to gather stats from the remote_searches log on the indexing tier relating to the searches from various search heads. These may include search heads where we do not see the _audit index. Added regex to ignore the strange presummarize that comes in from search heads that do not have accelerated reports...``` \ +index=_internal `indexerhosts` sourcetype=splunkd_remote_searches source="/opt/splunk/var/log/splunk/remote_searches.log" terminated: OR closed: ```Note that TERM(starting) has the apiStartTime, apiEndTime stats, but lacks the useful stats from a search that is complete. Also note that on indexers scan_count=events_count (in my testing). Finally various fields failed to auto-extract so regexes are used now, perhaps due to the length of some searches...``` \ +| rex "(?s) elapsedTime=(?P<elapsedTime>[0-9\.]+), search='(?P<search>.*?)(', savedsearch_name|\", drop_count=\d+)" \ +| regex search!="^(pretypeahead|copybuckets)" \ +| regex search!="^presummarize (tstats=t maintain=\"\" summaryprefix=\"[^\"]+\"|maintain=\"%22SUMMARY_ID%22%2C%22EARLIEST_TIME%22%2C%22REMOTE_SEARCH%22%2C%22NORM_SUMMARY_ID%22%2C%22NORM_REMOTE_SEARCH%22%0A\" summaryprefix=\"[^\"]+\")\s*$" \ +| rex "drop_count=[0-9]+, scan_count=(?P<scan_count>[0-9]+)" \ +| rex "(,|}\.\.\.) savedsearch_name=\"(?P<savedsearch_name>[^\"]*)\"," \ +| rex "(terminated|closed): search_id=(?P<search_id>[^,]+)" \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| rex "search_id=[^,]+,\s+server=(?P<server>[^,]+)" \ +| rename server AS search_head \ +| eval search_head_cluster=`search_head_cluster` \ +| fillnull savedsearch_name value="" \ +| rex field=search_id "^remote_(?P<sid>.*)" \ +| eval server_with_underscore = search_head. "_" \ +| eval sid=replace(sid, server_with_underscore, "") \ +| `search_type_from_sid(sid)` \ +| eval type=if(match(search,"^presummarize"),"acceleration",type) \ +| eval user=if(username=="nobody" OR username=="admin" OR (type=="acceleration" AND isnull(username)),"system","other") \ +| stats dc(search_id) AS search_count, max(elapsedTime) AS max_elapsed_time, avg(elapsedTime) AS avg_elapsed_time, sum(scan_count) AS total_scan_count by search_head, indexer_cluster, search_head_cluster, type, user \ +| eval prefix="platform_stats.remote_searches." \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head, search_head_cluster, indexer_cluster, type, user``` + +[IndexerLevel - RemoteSearches Indexes Stats] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This is an example of using the remote_searches.log on the indexers to determine which indexes are in use +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to determine index access via the remote_searches.log file, useful for when you cannot see the audit logs of all incoming search heads```\ + index=_internal sourcetype=splunkd_remote_searches source="/opt/splunk/var/log/splunk/remote_searches.log" terminated: OR closed: ```Note that TERM(starting) has the apiStartTime, apiEndTime stats, but lacks the useful stats from a search that is complete. Also note that on indexers scan_count=events_count (in my testing). Finally the elapsedTime sometimes failed to auto-extract, perhaps due to length...``` \ +| rex "(?s) elapsedTime=(?P<elapsedTime>[0-9\.]+), search='(?P<search>.*?)(', savedsearch_name|\", drop_count=\d+)" \ +| regex search!="^(pretypeahead|copybuckets)" \ +| rex "drop_count=[0-9]+, scan_count=(?P<scan_count>[0-9]+)" \ +| rex "total_slices=[0-9]+, considered_buckets=(?P<considered_count>[0-9]+)" \ +| rex "(,|}\.\.\.) savedsearch_name=\"(?P<savedsearch_name>[^\"]*)\"," \ +| rex "(terminated|closed): search_id=(?P<search_id>[^,]+)" \ +| regex search="^(litsearch|mcatalog|mstats|mlitsearch|litmstats|tstats|presummarize)" \ +| rex field=search max_match=50 "(?s)\|?\s*(mlitsearch)\s+.*?\[(?P<subsearch>.*?)\]\s*(\||$)" \ +| rex field=search "(?s)(?P<prepipe>\s*\|?([^\|]+))" \ +| nomv subsearch \ +| eval subsearch=if(isnull(subsearch),"",subsearch) \ +| eval prepipe = prepipe . " " . subsearch \ +| eval search=prepipe \ + ```The (index=* OR index=_*) index=<specific index> is a common use case for enterprise security, also some individuals like doing a similar trick so remove the index=*... as this is not a wildcard index search``` \ +| rex field=search "(?P<esstylewildcard>\(\s*index=\*\s+OR\s+index=_\*\s*\))" \ +| rex mode=sed field=search "s/search index=\s*\S+\s+index\s*=/search index=/" \ + ```Extract out index= or index IN (a,b,c) but avoid NOT index in (...) and NOT index=... and also NOT (...anything) statements``` \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex2>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "\s+(?P<skipping>\.\.\.\{skipping \d+ bytes\}\.\.\.)" \ + ```If skipping is in the logs as in index=abc- ...{skipping 46464 bytes}..., then drop the last index found in the regex as it is likely invalid``` \ +| eval indexregex=if(isnotnull(skipping),mvindex(indexregex,0,-2),indexregex) \ +| eval indexregex2=if(isnotnull(skipping),mvindex(indexregex2,0,-2),indexregex2) \ +| eval indexes=mvappend(indexregex,indexregex2) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval multi=if(mvcount(mvdedup(indexes))>1,"true","false") \ +| rex field=search_id "^remote_(?P<sid>.*)" \ +| rex "search_id=[^,]+,\s+server=(?P<server>[^,]+)" \ +| eval server_with_underscore = server. "_" \ +| eval sid=replace(sid, server_with_underscore, "") \ +| eval search_head=server \ +| `search_type_from_sid(sid)` \ +| `base64decode(base64username)` \ +| eval username3="unknown" \ +| eval user=coalesce(username, base64username, username3) \ +| rex field=search "^(?P<presummarize>presummarize)\s+" \ +| eval type=if(isnotnull(presummarize),"acceleration",type) \ +| eval search_head_cluster=`search_head_cluster` \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ + ```If you use the TERM(starting) you get the apiStartTime/apiEndTime, or you could join them in stats or similar...however this works to obtain which indexes are used. Note that you would need to build something similar to 'SearchHeadLevel - Search Queries summary non-exact match' to be able to translate the wildcards into something more useful, but there would be a lot of guesswork involved if you do not have usernames+server names+roles...(which is why audit logs work better for this)```\ +| rex "search_rawdata_bucketcache_error=[^,]+, search_rawdata_bucketcache_miss=(?P<cache_rawdata_miss>[^,]+), search_index_bucketcache_error=[^,]+, search_index_bucketcache_hit=(?P<cache_index_hit>[^,]+), search_index_bucketcache_miss=(?P<cache_index_miss>[^,]+), search_rawdata_bucketcache_hit=(?P<cache_rawdata_hit>[^,]+), search_rawdata_bucketcache_miss_wait=(?P<cache_rawdata_miss_wait>[^,]+), search_index_bucketcache_miss_wait=(?P<cache_index_miss_wait>[^,]+)" \ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval app=coalesce(app,base64appname,app3) \ +| stats dc(search_id) AS count, avg(elapsedTime) AS avg_total_run_time, max(elapsedTime) AS max_total_run_time, median(elapsedTime) AS median_total_run_time, avg(scan_count) AS avg_scan_count, max(scan_count) AS max_scan_count, min(scan_count) AS min_scan_count, median(scan_count) AS median_scan_count, sum(cache_rawdata_miss) AS cache_rawdata_miss, sum(cache_index_hit) AS cache_index_hit, sum(cache_index_miss) AS cache_index_miss, sum(cache_rawdata_hit) AS cache_rawdata_hit, sum(cache_rawdata_miss_wait) AS cache_rawdata_miss_wait, sum(cache_index_miss_wait) AS cache_index_miss_wait by user, search_head_cluster, indexes, indexer_cluster, type, multi, app \ +| eval indexes=lower(indexes) \ +| regex indexes!="\*" \ +| stats sum(count) AS count, avg(avg_total_run_time) AS avg_total_run_time, max(max_total_run_time) AS max_total_run_time, median(median_total_run_time) AS median_total_run_time, avg(avg_scan_count) AS avg_scan_count, max(max_scan_count) AS max_scan_count, min(min_scan_count) AS min_scan_count, median(median_scan_count) AS median_scan_count, sum(cache_rawdata_miss) AS cache_rawdata_miss, sum(cache_index_hit) AS cache_index_hit, sum(cache_index_miss) AS cache_index_miss, sum(cache_rawdata_hit) AS cache_rawdata_hit, sum(cache_rawdata_miss_wait) AS cache_rawdata_miss_wait, sum(cache_index_miss_wait) AS cache_index_miss_wait by indexes, indexer_cluster, user, search_head_cluster, type, multi, app \ +| eval prefix="platform_stats.remote_searches.per_index.exact." \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ +| eval short="False" \ + ```| mcollect index=a_metrics_index split=true prefix_field=prefix search_head_cluster, indexer_cluster, type, user, indexes, multi, app, short. If using Splunk 8.0.x delete the below lines and use mcollect, if not you can use summary indexing with metrics``` \ +| rename * AS platform_stats.remote_searches.per_index.exact.* \ +| rename platform_stats.remote_searches.per_index.exact.search_head_cluster AS search_head_cluster platform_stats.remote_searches.per_index.exact.indexer_cluster AS indexer_cluster, platform_stats.remote_searches.per_index.exact.type AS type, platform_stats.remote_searches.per_index.exact.user AS user, platform_stats.remote_searches.per_index.exact.indexes AS indexes, platform_stats.remote_searches.per_index.exact.multi AS multi, platform_stats.remote_searches.per_index.exact.short AS short, platform_stats.remote_searches.per_index.exact.app AS app \ +| fields - prefix + +[SearchHeadLevel - Summary searches using realtime search scheduling] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. Searches using collect commands should likely use realtime_schedule=0, there are also issues in the UI of some Splunk versions not seting this version automatically +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest "/servicesNS/-/-/saved/searches" count=0 `splunkadmins_restmacro` f=search f=realtime_schedule f=eai:* f=action.summary_index \ + ```Find reports running summary indexing that are not scheduled using continuous scheduling (realtime_schedule=0)```\ +| where realtime_schedule=1 \ +| rex field=search "(?s)\|\s*(?P<command>mcollect|meventcollect|collect)\s+" \ +| where isnotnull(command) OR 'action.summary_index'==1 \ +| rename eai:acl.app AS app, eai:acl.owner AS owner \ +| table title, app, owner, search, action.summary_index, id, updated + +[SearchHeadLevel - Searches dispatched as owner by other users] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report only advises which reports are been used via dispatch and running as the owner (not as the user), this is standard functionality in Splunk... +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit info=granted search_id!="'rsa_*" ```Report on searches dispatched with owner (not user) setting and the search involved``` \ +| rex "info=granted , search_id='(?P<search_id>[^']+)" \ + ```A regex that includes what a userid looks like within your company will likely be faster than trying to exclude all alternatives like the below...``` \ +| regex search_id!="^((subsearch_)?\d|rt_|(subsearch_)?scheduler|SummaryDirector_|ta_|RemoteStorageRetrieveIndexes_|md_|subsearch_searchparsetmp| alertsmanager_|subsearch_AlertActionsRequredFields|alertsmanager_|sd_)" \ +| rex field=search_id "(?P<from>[^_]+)((_(?P<base64owner>[^_]+))|(__(?P<owner>[^_]+)))" \ +| `base64decode(base64owner)` \ +| eval owner=coalesce(owner,base64owner) \ +| where from!=owner AND user!=from \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex "', savedsearch_name=\"(?P<savedsearch_name>[^\"]*)" \ +| table from, owner, savedsearch_name, search + +[SearchHeadLevel - DataModel Fields] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report prints out datamodel fields, found on slack thanks to Dave Shpritz (@automine) +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/models `splunkadmins_restmacro` | table title \ +| map maxsearches=30 search=" \ +|datamodel \"$title$\" \ +| spath output=model_name path=modelName \ +| spath output=foo path=objects{} \ +| mvexpand foo \ +| spath input=foo output=object_name path=objectName \ +| spath input=foo output=bar path=calculations{} \ +| spath input=foo output=foo path=fields{} \ +| table model_name,object_name,bar,foo \ +| eval foobar = mvappend(foo,bar) \ +| table model_name, object_name,foobar \ +| mvexpand foobar \ +| spath input=foobar output=field_name path=fieldName \ +| spath input=foobar output=field_type path=type \ +| spath input=foobar output=calc_type path=outputFields{}.type \ +| spath input=foobar output=calc_name path=outputFields{}.fieldName \ +| spath input=foobar output=field_desc path=comment{}.description \ +| spath input=foobar output=calc_desc path=outputFields{}.comment{}.description \ +| spath input=foobar output=field_recommended path=comment{}.recommended \ +| spath input=foobar output=calc_recommended path=outputFields{}.comment{}.recommended \ +| eval method=if(isnotnull(field_name),\"field\",\"eval\"), name=coalesce(field_name,calc_name), type=coalesce(field_type,calc_type), desc=coalesce(field_desc, calc_desc), recommended=coalesce(field_recommended, calc_recommended) \ +| fillnull value="false" recommended \ +| fields model_name, object_name, name, recommended, method, type, desc" + +[SearchHeadLevel - Dashboard refresh intervals] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Credit to Niket Nilay (@niketnilay) with modifications, report on dashboard refresh intervals/realtime refresh intervals +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` servicesNS/-/-/data/ui/views timeout=600 \ +| regex eai:data="(<earliest>rt-\d+[^\<]+|<refresh>\d+|refresh=\"[^\"]+)" \ +| rex field="eai:data" "(?s)<refresh>(?<refresh_time>\d+[^\<]+)<\/refresh>" max_match=30 \ +| rex field="eai:data" "(?s)refresh=\"(?<refresh_time>[^\"]+)\"" max_match=30 \ +| rex field="eai:data" "(?s)<earliest>rt-(?<refresh_time>\d+[^\<]+)" max_match=30 \ +| stats values(refresh_type) AS refresh_type by eai:appName, eai:acl.app, eai:acl.sharing, label, title, refresh_time \ +| rex field=refresh_time "^\d+(?P<refresh_unit>.*)" \ +| eval refresh_type=case(isnotnull(refresh_unit) AND match('eai:data',"<earliest>rt-\d+[^\<]+"),"RealTime",isnotnull(refresh_unit),"Search",1=1,"Form") \ +| addinfo \ +| eval refresh_time_seconds=if(isnotnull(refresh_unit),relative_time(info_search_time, "-" . refresh_time),refresh_time) \ +| eval refresh_time_seconds=if(isnotnull(refresh_unit),floor((refresh_time_seconds-info_search_time)*-1),refresh_time) \ +| fields - info_*, refresh_unit + +[SearchHeadLevel - Dashboards using depends and running searches in the background] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. For any depends= attributes in a dashboard, check if the searches below this level also depend on the token (in case they are background loading/searching even when not visible). Note that this search utilises the streamfilterwildcard custom search command included in the TA-Alerts for SplunkAdmins application on SplunkBase (or github) +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/ui/views `splunkadmins_restmacro` \ +| search eai:acl.app!=splunk_monitoring_console `dashboard_depends_filter1`\ +| regex eai:data="depends\s*=\"" \ +| fields eai:data, label, title, eai:acl.app \ +| rex field=eai:data max_match=20 "(?s)<row\s*(depends\s*=\"(?P<rowtokens>[^\"]+)|[^>]+depends\s*=\"(?P<rowtokens2>[^\"]+))(?P<row>.*?)</row>" \ +| rex field=eai:data max_match=20 "(?s)<panel\s*(depends\s*=\"(?P<paneltokens>[^\"]+)|[^>]+depends\s*=\"(?P<paneltokens2>[^\"]+))(?P<panel>.*?)</panel>" \ +| rex field=eai:data max_match=20 "(?s)<chart\s*(depends\s*=\"(?P<charttokens>[^\"]+)|[^>]+depends\s*=\"(?P<charttokens2>[^\"]+))(?P<chart>.*?)</chart>" \ +| rex field=eai:data max_match=20 "(?s)<event\s*(depends\s*=\"(?P<eventtokens>[^\"]+)|[^>]+depends\s*=\"(?P<eventtokens2>[^\"]+))(?P<event>.*?)</event>" \ +| rex field=eai:data max_match=20 "(?s)<map\s*(depends\s*=\"(?P<maptokens>[^\"]+)|[^>]+depends\s*=\"(?P<maptokens2>[^\"]+))(?P<map>.*?)</map>" \ +| rex field=eai:data max_match=20 "(?s)<single\s*(depends\s*=\"(?P<singletokens>[^\"]+)|[^>]+depends\s*=\"(?P<singletokens2>[^\"]+))(?P<single>.*?)</single>" \ +| rex field=eai:data max_match=20 "(?s)<table\s*(depends\s*=\"(?P<tabletokens>[^\"]+)|[^>]+depends\s*=\"(?P<tabletokens2>[^\"]+))(?P<table>.*?)</table>" \ +| rex field=eai:data max_match=20 "(?s)<viz\s*(depends\s*=\"(?P<viztokens>[^\"]+)|[^>]+depends\s*=\"(?P<viztokens2>[^\"]+))(?P<viz>.*?)</viz>" \ +| rex field=row "(?s)(?P<rowsearch><search[^>]*>.*?</search>)" \ +| rex field=panel "(?s)(?P<panelsearch><search[^>]*>.*?</search>)" \ +| rex field=chart "(?s)(?P<chartsearch><search[^>]*>.*?</search>)" \ +| rex field=event "(?s)(?P<eventsearch><search[^>]*>.*?</search>)" \ +| rex field=map "(?s)(?P<mapsearch><search[^>]*>.*?</search>)" \ +| rex field=single "(?s)(?P<singlesearch><search[^>]*>.*?</search>)" \ +| rex field=table "(?s)(?P<tablesearch><search[^>]*>.*?</search>)" \ +| rex field=viz "(?s)(?P<vizsearch><search[^>]*>.*?</search>)" \ +| eval rowsearchfiltered=mvfilter(match(rowsearch,"<search.*?\s+depends\s*=")) \ +| eval panelsearchfiltered=mvfilter(match(panelsearch,"<search.*?\s+depends\s*=")) \ +| eval chartsearchfiltered=mvfilter(match(chartsearch,"<search.*?\s+depends\s*=")) \ +| eval eventsearchfiltered=mvfilter(match(eventsearch,"<search.*?\s+depends\s*=")) \ +| eval mapsearchfiltered=mvfilter(match(mapsearch,"<search.*?\s+depends\s*=")) \ +| eval singlesearchfiltered=mvfilter(match(singlesearch,"<search.*?\s+depends\s*=")) \ +| eval tablesearchfiltered=mvfilter(match(tablesearch,"<search.*?\s+depends\s*=")) \ +| eval vizsearchfiltered=mvfilter(match(vizsearch,"<search.*?\s+depends\s*=")) \ +| where (isnotnull(rowsearch) AND (isnotnull(rowsearchfiltered) AND mvcount(rowsearch)!=mvcount(rowsearchfiltered)) OR (isnull(rowsearchfiltered))) \ + OR (isnotnull(panelsearch) AND (isnotnull(panelsearchfiltered) AND mvcount(panelsearch)!=mvcount(panelsearchfiltered)) OR (isnull(panelsearchfiltered)))\ + OR (isnotnull(chartsearch) AND (isnotnull(chartsearchfiltered) AND mvcount(chartsearch)!=mvcount(chartsearchfiltered)) OR (isnull(chartsearchfiltered)))\ + OR (isnotnull(eventsearch) AND (isnotnull(eventsearchfiltered) AND mvcount(eventsearch)!=mvcount(eventsearchfiltered)) OR (isnull(eventsearchfiltered))) \ + OR (isnotnull(mapsearch) AND (isnotnull(mapsearchfiltered) AND mvcount(mapsearch)!=mvcount(mapsearchfiltered)) OR (isnull(mapsearchfiltered)))\ + OR (isnotnull(singlesearch) AND (isnotnull(singlesearchfiltered) AND mvcount(singlesearch)!=mvcount(singlesearchfiltered)) OR (isnull(singlesearchfiltered)))\ + OR (isnotnull(tablesearch) AND (isnotnull(tablesearchfiltered) AND mvcount(tablesearch)!=mvcount(tablesearchfiltered)) OR (isnull(tablesearchfiltered)))\ + OR (isnotnull(vizsearch) AND (isnotnull(vizsearchfiltered) AND mvcount(vizsearch)!=mvcount(vizsearchfiltered)) OR (isnull(vizsearchfiltered))) \ +| eval rowtokens=coalesce(rowtokens,rowtokens2)\ +| nomv rowtokens\ +| eval rowtokens=replace(rowtokens,"\$","\\$")\ +| makemv tokenizer=(\S+) rowtokens\ +| eval rowtokens=split(rowtokens,",") \ +| eval paneltokens=coalesce(paneltokens,paneltokens2) \ +| nomv paneltokens\ +| eval paneltokens=replace(paneltokens,"\$","\\$")\ +| makemv tokenizer=(\S+) paneltokens\ +| eval charttokens=coalesce(charttokens,charttokens2)\ +| nomv charttokens\ +| eval charttokens=replace(charttokens,"\$","\\$")\ +| makemv tokenizer=(\S+) charttokens\ +| eval charttokens=split(charttokens,",") \ +| eval eventtokens=coalesce(eventtokens,eventtokens2)\ +| nomv eventtokens\ +| eval eventtokens=replace(eventtokens,"\$","\\$")\ +| makemv tokenizer=(\S+) eventtokens\ +| eval eventtokens=split(eventtokens,",") \ +| eval maptokens=coalesce(maptokens,maptokens2)\ +| nomv maptokens\ +| eval maptokens=replace(maptokens,"\$","\\$")\ +| makemv tokenizer=(\S+) maptokens\ +| eval maptokens=split(maptokens,",") \ +| eval singletokens=coalesce(single,single2)\ +| nomv singletokens\ +| eval singletokens=replace(singletokens,"\$","\\$")\ +| makemv tokenizer=(\S+) singletokens\ +| eval singletokens=split(singletokens,",") \ +| eval tabletokens=coalesce(tabletokens,tabletokens2)\ +| nomv tabletokens\ +| eval tabletokens=replace(tabletokens,"\$","\\$")\ +| makemv tokenizer=(\S+) tabletokens\ +| eval tabletokens=split(tabletokens,",") \ +| eval viztokens=coalesce(viztokens,viztokens2)\ +| nomv viztokens\ +| eval viztokens=replace(viztokens,"\$","\\$")\ +| makemv tokenizer=(\S+) viztokens\ +| eval viztokens=split(viztokens,",") \ + ```This would be more accurate as such but make the search really, really slow...so combining into 1 large value...\ +| streamfilter fieldname=rowtoken_matches pattern=rowtokens rowsearch\ +| streamfilter fieldname=paneltoken_matches pattern=paneltokens panelsearch\ +| streamfilter fieldname=chartoken_matches pattern=charttokens chartsearch\ +| streamfilter fieldname=eventtoken_matches pattern=eventtokens eventsearch\ +| streamfilter fieldname=maptoken_matches pattern=maptokens mapsearch\ +| streamfilter fieldname=singletoken_matches pattern=singletokens singlesearch\ +| streamfilter fieldname=tabletoken_matches pattern=tabletokens tablesearch\ +| streamfilter fieldname=viztoken_matches pattern=viztokens vizsearch\ + ```\ +| eval combined=mvappend(rowsearch, panelsearch, chartsearch, eventsearch, mapsearch, singlesearch, tablesearch, vizsearch)\ +| eval combinedtokens=mvappend(rowtokens, paneltokens, charttokens, eventtokens, maptokens, singletokens, tabletokens, viztokens)\ +| eval counter=mvcount(combinedtokens) \ +| where counter>0 \ +| `dashboard_depends_filter2` \ +| streamfilter fieldname=combinedtoken_matches pattern=combinedtokens combined\ +| makemv tokenizer=(\S+) combinedtoken_matches\ +| eval counter2=mvcount(combinedtoken_matches)\ +| where (isnotnull(combinedtoken_matches) AND mvcount(combinedtokens)!=mvcount(combinedtoken_matches)) OR isnull(combinedtoken_matches)\ +| rename eai:acl.app AS app\ +| `dashboard_depends_filter3` \ +| table title, label, app, eai:data, combinedtokens, combinedtoken_matches + +[SearchHeadLevel - SavedSearches using special characters] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Find special characters in saved searches, they are often copy & pasted in from another application and break searches +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest "/servicesNS/-/-/saved/searches" `splunkadmins_restmacro` \ +| regex search!="(?s)^[\\\d\s\w\|`\"\*\(\)\[\]\+=\-;:!,\./%\?<>{}^#$@'&~]+$" \ +| rex field=search "(?s)(?P<before_special_character>^[\\\d\s\w\|`\"\*\(\)\[\]\+=\-;:!,\./%\?<>{}^#$@'&~]*)(?P<special_character>.)" \ +| rex field=before_special_character "\|\s*(?P<command_before>\S+)[^\|]+$" \ +| fillnull command_before \ +| where command_before!="eval" \ +| rename eai:acl.owner AS owner, eai:acl.sharing AS sharing, eai:acl.app AS app \ +| table title, owner, app, sharing, special_character, before_special_character + +[SearchHeadLevel - Dashboards using special characters] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. Find special characters in dashboard searches, they are often copy & pasted in from another application and break searches +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/ui/views `splunkadmins_restmacro` f=eai:data f=label f=title f=eai:* timeout=600 count=0\ +| fields eai:data, label, title, eai:acl.app, eai:acl.owner, eai:acl.sharing \ +| search NOT (eai:acl.app=trackme AND title=TrackMe) NOT (title=available_icons AND eai:acl.app=network-diagram-viz)\ +| spath input=eai:data \ +| fields *search*.query, title, eai:*\ +| foreach *search*.query [ eval combined = mvappend(combined,'<<FIELD>>') ]\ +| regex combined!="(?s)^[\\\d\s\w\|`\"\*\(\)\[\]\+=\-;:!,\./%\?<>{}^#$@'&~°]+$"\ +| where isnotnull(combined)\ +| rex field=combined "(?s)(?P<before_special_character>^[\\\d\s\w\|`\"\*\(\)\[\]\+=\-;:!,\./%\?<>{}^#$@'&~]*)(?P<special_character>.)" max_match=0\ +| eval combined_fields=mvzip(combined, before_special_character,"%%%%")\ +| eval combined_fields=mvzip(combined_fields, special_character,"%%%%")\ +| eval combined_fields=mvfilter(!match(combined_fields, "(?s)^[\\\d\s\w\|`\"\*\(\)\[\]\+=\-;:!,\./%\?<>{}^#$@'&~]+$"))\ +| mvexpand combined_fields\ +| makemv delim="%%%%" combined_fields\ +| eval query=mvindex(combined_fields,0)\ +| eval before_special_character=mvindex(combined_fields,1)\ +| eval special_character=mvindex(combined_fields,2)\ +| rex field=before_special_character "\|\s*(?P<command_before>\S+)[^\|]+$"\ +| fillnull command_before\ +| where command_before!="eval" ``` eval is normally pretty safe to use with special characters and less likely to be a mistake ```\ +| decrypt field=special_character hex() emit('hex_special_character')\ +| stats list(query) AS queries, list(special_character) AS special_characters, list(before_special_character) AS before_special_character, list(hex_special_character) AS hex_special_character by title, eai:acl.app, eai:acl.owner, eai:acl.sharing\ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing, eai:acl.owner AS owner + +[ClusterMasterLevel - excess buckets on master] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = */20 * * * * +description = Chance the alert requires action? High. Cluster master specific? Yes. As of 7.3.4/8.0.1 excess buckets do not clear themselves, Excess buckets have appeared it is recommended to clear them after some period of time \ +Since the clear all excess buckets button can cause performance issues you may like to run: \ +splunk list excess-buckets | grep "index" | cut -d "=" -f2 > tmp.txt \ +for z in `cat tmp.txt `; do echo $z; /opt/splunk/bin/splunk remove excess-buckets ${z}; sleep 10; done ; \ +Also adjust the sleep time as appropriate for the number of buckets/cluster size... \ +Or more a human readable version splunk list excess-buckets | egrep "index|Total number of excess replication"| cut -d= -f2 | tac | paste -d" " - - +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 2000 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/cluster/master/indexes `splunkadmins_clustermaster_host` f=total_excess* | table title, total_excess_bucket_copies, total_excess_searchable_copies \ +| addcoltotals +disabled = 1 + +[AllSplunkLevel - Unexpected termination of a Splunk process windows] +alert.severity = 5 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 9 * * * * +description = Chance the alert requires action? High. A Splunk process on Windows was terminated multiple times (this is a retrospective alert) contributed by Chris Bell +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A retrospective alert to advise that the Splunk process was terminated and restarted multiple times and this likely requires further investigation``` \ +index=`splunkadmins_wineventlog_index` sourcetype=WinEventLog:Application OR sourcetype=XmlWinEventLog:Application SourceName="Application Error" splunk \ +| stats count, earliest(_time) AS firstSeen, latest(_time) AS lastSeen by host, Faulting_application_path \ +| where count > `splunkadmins_unexpected_term_count` \ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[AllSplunkLevel - Unexpected termination of a Splunk process unix] +alert.severity = 5 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = */20 * * * * +description = Chance the alert requires action? High. A Splunk process on Unix was terminated (this alert is after restart/retrospective only) +dispatch.earliest_time = -20m@m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```A retrospective alert to advise that the Splunk process was terminated and this likely requires further investigation``` \ +index=_internal index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) \ +```Expecting "FATAL ProcessRunner - Unexpected EOF from process runner child!" OR "ProcessRunner - helper process seems to have died (child killed by signal 9: Killed)!" which can occur by the OOM killer for example``` \ +"Unexpected EOF from process runner" OR "helper process seems to have died" \ +| eval event_message=coalesce(event_message,message) \ +| stats count, earliest(_time) AS firstSeen, latest(_time) AS lastSeen, values(event_message) AS event_message by host\ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[IndexerLevel - strings_metadata triggering bucket rolling] +alert.severity = 2 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 47 4 * * * +description = Chance the alert requires action? Moderate. This relates to premature bucket rolling so it may or may not be a high priority issue... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The caller=strings_metadata relates to maxMetaEntries in the indexes.conf.spec file, at the time of writing it is the maximum number of unique lines in .data files in a bucket, once exceeded it is rolled so this may cause premature bucket rolling``` \ +index=_internal `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source` caller=strings_metadata \ +| cluster showcount=true \ +| stats sum(entries) AS count, values(host) AS hosts, values(event_message) AS event_messages by idx +disabled = 1 + +[SearchHeadLevel - Lookup CSV size] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report will work in Splunk 7.3.3 and above as the getsize=true option is available on the REST endpoint, prior to this version the file-explorer endpoint can be used (see dashboard ... ?). Contributed by an anonymous source +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/admin/transforms-lookup getsize=true \ +| eval name = 'eai:acl.app' + "." + title \ +| rename "eai:acl.sharing" AS sharing | eval is_temporal = if(isnull(time_field),0,1) \ +| table name type is_temporal size sharing \ +| join type=left name \ + [ rest `splunkadmins_restmacro` /servicesNS/-/-/admin/kvstore-collectionstats \ + | table data \ + | mvexpand data \ + | spath input=data \ + | table ns size \ + | rename ns as name ] \ +| sort - size \ +| eval size=round(size/1024/1024) + +[ForwarderLevel - Data dropping duration] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report will measure if an output queue has dropped data and for the duration of the data drop. This is normally relevant when you are cloning data to more than 1 output location in outputs.conf. Note that as of 8.0.3 the pipeline dropping the data is not recorded but the drop is per-pipeline, per-output queue, frequent "dropping" results in pausing of the TCP output queue which can cause issues for all upstream queues... +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` "Queue for group" "has begun dropping" OR "has stopped dropping events" `heavyforwarderhosts` OR `indexerhosts` \ +| rex "group\s+(?P<group>\S+).*?(?P<state>(stopped|begun))" \ + ```instead of sort 0 _time, | reverse may work in some scenarios...``` \ +| sort 0 _time\ +| streamstats current=f global=f window=1 values(state) AS prev_state, min(_time) AS start by host, group\ +| search state="stopped" AND prev_state="begun"\ +| eval duration=_time-start\ +| eval shorthost=replace(host, "^([^\.]+).*", "\1")\ +| eval combined = shorthost. "_" . group\ +| timechart limit=50 max(duration) AS duration by combined + +[ForwarderLevel - Channel churn issues] +alert.severity = 2 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 47 4 * * * +description = Chance the alert requires action? Moderate. This relates to channel churn issues, and is likely no longer required after 8.0.6 and above, https://answers.splunk.com/answers/825663/why-did-ingestion-slow-way-down-after-i-added-thou.html has more details on this... +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `splunkadmins_metrics_source` TERM(name=pipelineinputchannel) new_channels sourcetype=splunkd ```As per https://answers.splunk.com/answers/825663/why-did-ingestion-slow-way-down-after-i-added-thou.html , having too many channels created/removed can cause issues```\ +| bin _time span=1m\ +| stats avg(new_channels) AS avg_new_channels avg(removed_channels) AS avg_removed_channels by host, _time\ +| where avg_new_channels>5000 AND avg_removed_channels>1000 \ +| stats count, max(avg_new_channels) AS max_avg_new_channels, max(avg_removed_channels) AS max_avg_removed_channels, max(_time) AS _time by host +disabled = 1 + +[AllSplunkLevel - TailReader Ignoring Path] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 7 * * * * +description = Chance the alert requires action? High. In this alert the TailReader is ignoring files, therefore it you need them to be indexed you will likely need to create a props.conf entry for the required sourcetype +dispatch.earliest_time = -60m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Ignoring path is quite literal, the TailReader process will ignore the said log file and never index it. If you want the said file indexed then you will need to create a sourcetype for it...``` \ +index=_internal sourcetype=splunkd (`splunkadmins_splunkd_source`) OR (`splunkadmins_splunkuf_source`) "Ignoring path" earliest=-24h `splunkadmins_tailreader_ignorepath` \ +| regex path!="\.\d$" \ +| stats latest(_time) AS lastSeen, earliest(_time) AS firstSeen, last(_raw) AS lastmessage by host, path \ +| eval firstSeen=strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") +disabled = 1 + +[SearchHeadLevel - Dashboards with all time searches set] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 7 3 * * 1 +description = Chance the alert requires action? Low. This alert is designed to highlight dashboards that have no earliest/latest within a <search> element, and no global time picker defined, therefore it is likely that using this dashboard would result in all time searches running. The more accurate search is "SearchHeadLevel - audit logs showing all time searches". For macro substitution to work the splunkadmins_macros lookup file needs to exist. Note this is likely to generate some false alarms, I have attempted to cater for earliest= within tokens +dispatch.earliest_time = -60m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/data/ui/views timeout=600 \ +```While it will be more accurate to look at the audit logs to see who is using all time in the earliest/latest fields, this is an attempt to identify dashboards that do not have a <earliest> field *or* an earliest= within the search query. There is likely room for improvement in this query but it appears to work so far...``` \ +| search NOT ((eai:acl.app="splunk_simple_xml_examples" OR eai:acl.app=splunk_app_windows_infrastructure) AND eai:acl.owner="nobody") \ +| rex field=eai:data "(?s)<input\s*(?P<time_input>[^>]*type\s*=\s*\"time[^>]+)>" \ +| eval has_global_time_picker=if(match(time_input, "token\s*="),null(),if(isnotnull(time_input),true(),null())) \ +| where isnull(has_global_time_picker) \ +| rex field=eai:data max_match=500 "(?s)<searc(?P<base>h[^>]*)>(?P<search>.*?)</search>" \ +| eval combined = mvzip(base, search, "%%%%%%%%%%") \ + ```From the data, find tokens, if the token includes an earliest= value, find it and store it into token_name2``` \ +| multireport \ + [| xpath field=eai:data "//input" outfield=input \ + | eval input=mvfilter(match(input,"token\s*=\s*")) \ + | xpath field=input "//@token" outfield=token \ + | xpath field=input "//input" outfield=tokenremainder \ + | makemv token tokenizer=(\S+) \ + | eval token_combined=mvzip(token, tokenremainder, "%%%%%%%%%%") \ + | eval token_combined=mvfilter(match(token_combined,"earliest\s*=\s*"))\ + | eval token_name2=mvindex(split(token_combined, "%%%%%%%%%%"),0)\ + | stats count, values(token_name2) AS time_tokens by eai:acl.app, eai:acl.sharing, eai:appName, combined, label, title, eai:acl.owner, updated ] \ + [| stats count, values(token_name2) AS time_tokens by eai:acl.app, eai:acl.sharing, eai:appName, combined, label, title, eai:acl.owner, updated ] \ +| stats count, values(time_tokens) AS time_tokens by eai:acl.app, eai:acl.sharing, eai:appName, combined, label, title, eai:acl.owner, updated\ +| eventstats values(time_tokens) AS time_tokens by eai:acl.app, eai:acl.sharing, eai:appName, label, title, eai:acl.owner, updated\ +| eval split=split(combined,"%%%%%%%%%%") \ +| eval base=mvindex(split,0) \ +| eval search=mvindex(split,1) \ +| fields eai:acl.app, eai:acl.sharing, eai:appName, search, base, label, title, updated, eai:acl.owner, time_tokens\ +| where NOT match(base,"(base=|ref=)") AND match(search, "<query>") \ +| eval splunk_server = `splunkadmins_splunk_server_name`\ +| `splunkadmins_macro_sub("search")` \ +| `splunkadmins_macro_sub("search")` \ +| rex field=search "(?s)<query>(?P<query>.*?)</query>" \ +| rex field=query "earliest\s*=\s*(?P<earliest>\s*\S+\s)" \ +| where isnull(earliest)\ +| eval hassearch=if(match(query, "(?s)^\s*\|\s*search\s+"),1,0) \ +| where hassearch==0 AND NOT match(query, "(?s)^\s*\||^\s*<!\[CDATA\[\s*\|") \ +| regex search!="(?s)<earliest>.*?</earliest>"\ +| rex field=query "\$(?P<token>[^\$]+)\$" max_match=50\ +| nomv time_tokens\ +| eval matches=mvmap(token,if(match(time_tokens,"(^|\s+)" . token . "(\s+|$)"),"true",null()))\ +| where isnull(matches)\ +| stats count, values(search) AS search_examples by eai:acl.app, eai:acl.sharing, label, title, updated, eai:acl.owner\ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing, eai:acl.owner AS owner, title AS label\ +| table label, app, sharing, updated, owner, search_examples +disabled = 1 + +[SearchHeadLevel - audit logs showing all time searches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 6 * * 1 +description = Report only? Yes. This report will attempt to find all-time searches that have been run and provide information about the context in which they were run. There are various other alerts/reports that can assist in identifying them more proactively, this one reports that they have happened...Note that this is not 100% accurate as via API you can set _index_earliest without setting an earliest= as per the comments on https://ideas.splunk.com/ideas/E-I-49 +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Search the audit logs for any search that does not have a earliest time set. Note search_et is not set for canceled/failed status so deal with this later...As per the comments on https://ideas.splunk.com/ideas/E-I-49 this can miss the _index_earliest flag passed in via API but it works for most cases``` \ +index=_audit sourcetype=audittrail search_id!="rsa_*" `searchheadhosts` info="failed" OR info="completed" OR info="canceled" search=* search_et="N/A" `splunkadmins_audit_alltime` \ +| regex search="(?s)^'\s*\|?search\s+" \ +| regex search_id!="^'subsearch_" \ +| eval has_earliest=if((info="failed" OR info="canceled") AND api_et!="N/A",true(),null()) \ +| where isnull(has_earliest) \ +| eval search_id=substr(search_id,1) \ +| `search_type_from_sid(search_id)` \ +| eval total_run_time=round(total_run_time) \ +| where total_run_time>0 \ +| sort - total_run_time \ +| `base64decode(base64appname)` \ +| eval app_name=coalesce(app,base64appname) \ +| fillnull app_name, savedsearch_name value="" \ +| stats count, latest(_time) AS most_recent, values(info) AS info, list(total_run_time) AS total_run_time, values(search) AS search_example by user, type, savedsearch_name, app_name \ +| eval total_run_time=mvdedup(total_run_time), most_recent=strftime(most_recent, "%+") \ +| sort - total_run_time + +[SearchHeadLevel - DataModels report] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 50 5 * * * +description = Report only? Yes. This report is required to support the audit log summary searches. Search Head specific? Yes +dispatch.earliest_time = @d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = | rest /servicesNS/-/-/datamodel/model `splunkadmins_restmacro` search=eai:type=datamodel \ +| table title, description, eai:acl.sharing, eai:acl.app \ +| spath input=description path=objects{}.objectSearch output=objectSearch \ +| fields - description \ +| eval objectSearch=mvindex(objectSearch,0) \ +| rex field=objectSearch "^\s*\|?(?P<definition>[^\|]+)" \ +| rename title AS datamodel, eai:acl.sharing AS sharing, eai:acl.app AS app \ +| table datamodel, sharing, app, definition \ +| eval splunk_server="default" \ +| outputlookup splunkadmins_datamodels + +[SearchHeadLevel - Tags report] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 50 5 * * * +description = Report only? Yes. This report is required to support the audit log summary searches. Search Head specific? Yes +dispatch.earliest_time = @d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = | rest /servicesNS/-/-/configs/conf-tags `splunkadmins_restmacro` \ +| search eai:acl.sharing!="user" \ +| eval _raw="" \ +| foreach "*" \ + [| eval field=if(match("<<FIELD>>","^(title|eai:|splunk_server|author|id|updated|published)"),"","<<FIELD>> = ".'<<FIELD>>') \ + | eval _raw=mvappend(_raw,field) ] \ +| rex max_match=0 field=_raw "(?P<tag>\S+)\s+=\s+enabled" \ +| table tag, title, eai:acl.app, eai:acl.sharing \ +| rename title AS definition, eai:acl.app AS app, eai:acl.sharing AS sharing \ +| mvexpand tag \ +| eval splunk_server="default" \ +| outputlookup splunkadmins_tags + +[SearchHeadLevel - EventTypes report] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 50 5 * * * +description = Report only? Yes. This report is required to support the audit log summary searches. Search Head specific? Yes +dispatch.earliest_time = @d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = | rest /servicesNS/-/-/saved/eventtypes `splunkadmins_restmacro` search=index search=disabled=0 \ +| search eai:acl.sharing!="user" \ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing \ +| eval sharing=if(sharing=="system","global",sharing) \ +| table title, search, app, sharing \ +| rename search as definition, title AS eventtype \ +| eval splunk_server="default" \ +| outputlookup splunkadmins_eventtypes + +[SearchHeadLevel - splunk_search_messages dispatch] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 7 * * * * +description = Chance the alert requires action? Moderate. Search error messages are generally visible to users and often an issue in the environment. Note this alert requires the splunk_search_messages sourcetype and the [search]\ +log_search_messages = true\ +In the limits.conf file and then use the search_messages.log file. If below version 9.1 +dispatch.earliest_time = -60m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Look for dispatch problems in the splunk_search_messages sourcetype \ +This does require the limits.conf log_search_messages=true setting to be enabled to work, if below version 9.1```\ +index=_internal `searchheadhosts` orig_component="DispatchThread" sourcetype=splunk_search_messages \ +| cluster t=0.4 showcount=true \ +| table _time, cluster_count, _raw \ +| sort - cluster_count +disabled = 1 + +[SearchHeadLevel - WLM aborted searches] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 7 * * * * +description = Chance the alert requires action? Moderate. This alert exists to use sendresults or similar to email the users about their search termination, as the current WLM notification system is limited as of version 8.0.5 +dispatch.earliest_time = -60m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Detect searches aborted by WLM rules and using the sid find the details of the search, provide the details so this could be used with an application like sendresults for automated user notification of rule violation``` \ +index=_internal "The search" "was aborted" `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` \ +| rex "WorkloadManager (?:\[\d+ \w+] )?- The search (?P<search_id>[^ ]+)" \ +| eval search_id="'" . search_id . "'" \ +| appendpipe \ + [| map [ search index=_audit info=completed OR info=failed host=$host$ search_id="$search_id$" | eval search_lt=if(search_lt=="N/A",_time,search_lt), search_et=if(search_et=="N/A",now()-(365*24*60*60),search_et) | eval period=tostring(search_lt-search_et,"duration") | table user, total_run_time, search, search_id, period, savedsearch_name ] maxsearches=20 ] \ +| table user, total_run_time, search, search_id, period, savedsearch_name \ +| eval savedsearch_name=if(isnull(savedsearch_name),"ad-hoc",savedsearch_name) \ +| stats values(*) AS * by search_id +disabled = 1 + +[SearchHeadLevel - dispatch metadata files may need removal] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 57 */4 * * * +description = Chance the alert requires action? High. When this particular warning occurs repetitively it usually reuqires manual intervention from the Splunk admin to remove the dispatch directory. +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This warning when occurring repetitively tends to indicate some kind of issue that will require the file to be manually removed. For example a zero sized metadata file that cannot be reaped by the dispatch reaper``` \ +index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` WARN DispatchSearchMetadata \ +| stats count by event_message, host \ +| where count>100 \ +| rex field=event_message "file: (?P<filename>.*)" \ +| table filename, host, event_message +disabled = 1 + +[IndexerLevel - Slow peer from remote searches] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = */30 * * * * +description = Chance the alert requires action? Moderate. This alert is an example of how to find if a single (or a few) search/indexing peers are returning results more slowly than other peers resulting in slow searches +dispatch.earliest_time = -30m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This warning when occurring repetitively tends to indicate some kind of issue that will require the file to be manually removed. For example a zero sized metadata file that cannot be reaped by the dispatch reaper``` \ +index=_internal `indexerhosts` source=*remote_searches.log terminated: OR closed: \ +| regex search!="^(pretypeahead|copybuckets)" \ +| rex "(?s) elapsedTime=(?P<elapsedTime>[0-9\.]+), search='(?P<search>.*?)(', savedsearch_name|\", drop_count=\d+)" \ +| rex "(terminated|closed): search_id=(?P<search_id>[^,]+)" \ +| regex search="^(litsearch|mcatalog|mstats|mlitsearch|litmstats|tstats|presummarize)" \ +| regex search_id="^remote" \ +| stats last(_time) AS _time, avg(elapsedTime) AS avgelapsedtime, max(elapsedTime) AS maxelapsedtime by search_id, host \ +| eventstats max(maxelapsedtime) AS slowest, avg(avgelapsedtime) AS average by search_id \ +| eval slow=average+`splunkadmins_slowpeer_time`, comment="Tested stddev() but what if the search is smaller than normal and some indexers take 5X longer, if the search was 3 seconds who cares" \ +| where maxelapsedtime>slow AND maxelapsedtime==slowest \ +| bin _time span=5m \ +| stats count by host, _time \ +| where count>`splunkadmins_slowpeer_threshold` +disabled = 1 + +#Enable scheduling on this report if you need to translate the RMD5 values in searches back into real search names (used in various searches) +[SearchHeadLevel - RMD5 to savedsearch_name lookupgen report] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 50 5 * * * +description = Report only? Yes. This report is required to support various other searches that translate the RMD5 values back into real savedsearch names. Search Head specific? Yes +dispatch.earliest_time = -24h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = ```Search the audit logs to find the RMD5 entries and record them into a lookup file``` index=_audit info=completed RMD5* search_id!="'rsa_*"\ +| regex search_id!="^'subsearch_"\ +| rex field=search_id "(?P<RMDvalue>RMD5[^_]+)"\ +| stats values(savedsearch_name) AS savedsearch_name by RMDvalue\ +| lookup splunkadmins_rmd5_to_savedsearchname RMDvalue OUTPUT savedsearch_name AS savedsearch_name_currrent\ +| where isnull(savedsearch_name_currrent)\ +| outputlookup splunkadmins_rmd5_to_savedsearchname append=true + +[SearchHeadLevel - Dashboards invalid character in splunkd] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 11 4 * * * +description = Chance the alert requires action? Moderate. One or more invalid character messages appeared in the Splunkd logs. This may require additional investigation. +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "AdminManager" "invalid"\ +| rex "<label>(?P<dashboard_label>[^<]+)"\ +| rex "(?s)^(\S+\s+){3}(?P<error>.*)"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(host) AS hosts, values(dashboard_label) AS dashboard_label by error\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+")\ +| table count, dashboard_label, error, mostrecent, firstseen +disabled = 1 + +[SearchHeadLevel - savedsearches invalid character in splunkd] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 17 4 * * * +description = Chance the alert requires action? Moderate. One or more invalid character messages appeared in the Splunkd logs. This may require additional investigation. +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "invalid" "toAtom"\ +| rex "(?s)^(\S+\s+){3}(?P<error>.*)"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(host) AS hosts by error\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+") +disabled = 1 + +[SearchHeadLevel - datamodel errors in splunkd] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 23 4 * * * +description = Chance the alert requires action? Moderate. One or more datamodel errors exist in the splunkd logs. This may require additional investigation. +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_splunkd_source` "ERROR DataModelObject" OR "ERROR DataModel" NOT "because KV Store initialization has not completed yet" NOT "KV Store is shutting down"\ +| rex "(?s)^(\S+\s+){3}(?P<error>.*)"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(host) AS hosts by error\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+")\ +| table count, mostrecent, firstseen, hosts, error +disabled = 1 + +[SearchHeadLevel - Search Messages field extractor slow] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 52 5 * * * +description = Report only? Yes. Splunk search messages are showing slow field extractor messages, this does require the limits.conf setting log_search_messages=true, if below version 9.1 +dispatch.earliest_time = -24h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.timeRangePicker.show = 0 +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +schedule_window = 30 +search = index=_internal `searchheadhosts` sourcetype=splunk_search_messages "extractor" "slow"\ +| `search_type_from_sid(sid)`\ +| eval type=case(from=="scheduler","scheduled",from=="SummaryDirector","acceleration",isnotnull(searchname),"dashboard",1=1,"ad-hoc")\ +| multireport [ | `base64decode(base64username)` ] [ | eval keepme="yes"]\ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval report=coalesce(searchname,searchname2), app=coalesce(app,base64appname,app3)\ +| rex field=message "^(\[subsearch\])?\s*\[[^\]]+\]\s+(?P<sub_message>.*?\()"\ +| fillnull app, username, report, sub_message value="N/A"\ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(message) AS message, values(host) AS hosts by app, username, type, report, sub_message\ +| lookup splunkadmins_rmd5_to_savedsearchname RMDvalue AS report OUTPUT savedsearch_name\ +| eval report=case(match(report,"^RMD") AND isnotnull(savedsearch_name),savedsearch_name,match(report,"^RMD"),"N/A",1=1,report) \ +| table username, app, report, message, mostrecent, firstseen, type, count, hosts\ +| sort - mostrecent\ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+") + +[SearchHeadLevel - Search Messages user level] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 28 4 * * * +description = Chance the alert requires action? Moderate. This is designed for use with something like sendresults to send the failures to the owner of the mentioned search, this does require the limits.conf setting log_search_messages=true, if below version 9.1. This alert relies on "SearchHeadLevel - RMD5 to savedsearch_name lookupgen report" to obtain accurate results for the savedsearch name +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find various messages in the splunk_search_messages which are related to scheduled searches or dashboards which may require correcting, ignore ad-hoc searches\ +This does require the limits.conf log_search_messages=true setting to be enabled to work, if below version 9.1```\ + index=_internal `searchheadhosts` sourcetype=splunk_search_messages "MultiValueProcessor" OR "SearchStatusEnforcer" OR "SearchOperator" OR "SQL" OR "truncated" OR "script" OR "KV" OR "External" OR "outputcsv" OR "reset" OR "match_limit" OR authorized OR terminated OR depth_limit OR driver OR dbx OR command OR java OR reading OR training OR DensityFunction OR (TERM(in-memory) limit) OR terminated OR invalid OR (missing NOT orig_component="SummaryIndexProcessor") OR Unable OR reset OR AutoLookupDriver OR ImportError OR jdbc OR SSLError OR TERM(code=) OR REST OR SearchParser \ +OR (missing NOT orig_component="SummaryIndexProcessor") OR Unable OR ("Can't" "parse") OR "field(s) do not exist" OR ("setting" "deprecated") OR ("ignored" "missing") OR "Dropping field(s) with too many distinct values" OR "does not exist" OR orig_component="TsidxStats" OR orig_component="SearchOrchestrator" OR orig_component="ForeachProcessor" OR orig_component="SearchPhaseGenerator" OR orig_component="SearchProcessor" OR HTTPError OR ("Can not find object" "of type connection.") OR message_key="EVAL:BOOLEAN_RESULT" \ +```Potential issues that are not included SearchEvaluatorBasedExpander, shows if eventtypes/tags are disabled/do not exist or similar``` `splunkadmins_searchmessages_user_1` \ + NOT "KV Store lookup table is empty" NOT "message=Restricting results of the \"rest\" operator to the local instance because you do not have the" NOT "Failed to fetch REST endpoint uri=https://127.0.0.1:8089/services/data/indexes-extended/" NOT "Unexpected status for to fetch REST endpoint uri=https://127.0.0.1:8089/services/data/indexes-extended" NOT "Failed to fetch REST endpoint uri=https://127.0.0.1:8089/services/data/indexes" NOT "The REST request on the endpoint URI /services/data/indexes" NOT "message=Could not locate the time (_time) field on some results returned from the external search command 'curl'" NOT "message=Found no results to append to collection" NOT "The search you ran returned a number of fields that exceeded the current indexed field extraction limit" NOT "message=Found no results to append to collection" NOT "The search you ran returned a number of fields that exceeded the current indexed field extraction limit" NOT "Connection failed with Read Timeout" NOT "message=Search was canceled" NOT "message=Search auto-canceled" NOT "The timewrap command is designed to work on the output of timechart" NOT ("Field" "does not exist") NOT "Connection reset by peer" NOT "Reading error while waiting for peer" NOT "Restricting results of the \"rest\" operator to the local instance" NOT "occur when processing chunks in running lookup command" NOT "because KV Store initialization has not completed yet" NOT "The following options were specified but have no effect" NOT "https://127.0.0.1:8089/servicesNS/nobody/SA-ITOA/itoa_interface/generate_entity_filter" NOT "because KV Store status is currently unknown" NOT ("https://127.0.0.1:8089/services/server/introspection/kvstore/collectionstats" OR "https://127.0.0.1:8089/services/server/sysinfo" ("exists in the REST API" OR "Forbidden")) NOT ("https://127.0.0.1:8089/services/data/indexes-extended" OR "https://127.0.0.1:8089/services/data/indexes" ("Not Found" OR "exists in the REST API")) NOT "Only the last one will appear, and previous" NOT ("Field extractor" "unusually slow") \ + NOT "Unable to distribute to peer" NOT (Eventtype "does not exist or is disabled") NOT "Unable to find tag" NOT "reference cycle in the lookup configuration" NOT "Search cancellation requested." NOT "because KV Store is shutting down" NOT "The 'require' command received zero events or results" NOT "Bundle replication to peer named" ``` this should only occur until the bundle gets to the indexer in question ``` NOT "Application does not exist"\ +```OR TERM(filters) was originally in the query, but the error \"Search filters specified using splunk_server/splunk_server_group do not match any search peer.\" can occur anytime there are zero results, even if the splunk_server=/splunk_server_group= was not the cause of the issue, therefore this particular warning is not useful in it's current form...``` \ +| regex sid!="^(rt_)?(ta_)?(subsearch_)*(nested_[^_]+_)?\d+" \ +| `search_type_from_sid(sid)`\ +| eval type=case(from=="scheduler","scheduled",from=="SummaryDirector","acceleration",isnotnull(searchname),"dashboard",1=1,"ad-hoc") \ +| search ```Depending on how noisy this alert is you may wish to add type!=dashboard using the macro splunkadmins_searchmessages_user_2``` NOT ("command=\"predict\", Too few data points" AND type="dashboard") NOT (type="dashboard" "https://127.0.0.1:8089/servicesNS/-/-/admin/file-explorer") NOT (type="dashboard" "https://127.0.0.1:8089/servicesNS/-/-/admin/file-explorer" OR "The specified span would result in too many") NOT (type="ad-hoc" "DAG Execution Exception: Search has been cancelled") `splunkadmins_searchmessages_user_2` \ +| `base64decode(base64username)` \ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval report=coalesce(searchname,searchname2), app=coalesce(app,base64appname,app3), username=coalesce(username,base64username) \ +| fillnull app, username, report value="N/A" \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(message) AS message, values(search_head_cluster) AS search_head_cluster, values(orig_component) AS orig_component, values(sid) AS search_ids by app, report, username, type \ +| eval search_ids=mvindex(search_ids,0,10) \ +| lookup splunkadmins_rmd5_to_savedsearchname RMDvalue AS report OUTPUT savedsearch_name \ +| eval report=case(match(report,"^RMD") AND isnotnull(savedsearch_name),savedsearch_name,match(report,"^RMD"),"N/A",1=1,report) \ +| eval reason=case(type=="dashboard","Errors from viewing one or more dashboards, the dashboard owner can likely fix this if you can determine which dashboard is an issue, or contact the Splunk admin team",type=="scheduled","Please review and correct this error or contact the Splunk admin team for assistance",type=="acceleration","Broken acceleration/summary search, admin investigation required via audit index",1=1,"Unknown type") \ +| eval message=mvindex(message,0,30) \ +| table username, reason, app, report, message, mostrecent, firstseen, type, count, search_head_cluster, orig_component, search_ids \ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+") +disabled = 1 + +[SearchHeadLevel - Search Messages admins only] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 28 4 * * * +description = Chance the alert requires action? Moderate. This is designed for use with something like sendresults to send the failures to the owner of the mentioned search, this does require the limits.conf setting log_search_messages=true, if below version 9.1. +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to find various messages in the splunk_search_messages which are related to scheduled searches or dashboards which may require correcting, ignore ad-hoc searches\ +This does require the limits.conf log_search_messages=true setting to be enabled to work. If below version 9.1```\ + index=_internal `searchheadhosts` sourcetype=splunk_search_messages (Unable peer) OR bundles OR "bundle replication" OR corrupt OR connecting OR ReadWrite OR Socket OR Timed OR incomplete OR cleanly OR Timeout OR Timed OR process OR insufficient OR (bucket failed) OR "occur when processing chunks in running lookup command" OR "because KV Store status is currently unknown" OR (File line) OR (SearchPipelineExecutor NOT "exceeded configured match_limit") OR S2BucketCache OR DistributedSearchResultCollectionManager OR ("Field extractor" "unusually slow") OR "line *:" OR GeoIPProvider OR "restricting search to" OR ExternalProvider OR message_key="SUMMARIZE:PEER_NOT_FINISHED_AFTER_MAXTIME_EXCEEDED" \ +NOT "Unable to find tag" NOT "Unable to parse the search" NOT ("Eventtype" "does not exist") NOT "Error in 'outputlookup' command: You have insufficient privileges" NOT "insufficient data in ITSI summary index for policies" \ +NOT ("Failed to fetch REST endpoint" "/services/data/indexes-extended" "Check that the URI path provided exists in the REST API" OR "Not Found")\ +`splunkadmins_searchmessages_admin_1`\ + ```Potential issues that are not included SearchEvaluatorBasedExpander, shows if eventtypes/tags are disabled/do not exist or similar``` \ +NOT ("Failed to fetch REST endpoint" "/services/data/indexes-extended" "Check that the URI path provided exists in the REST API" OR "Not Found") NOT "Found no results to append to collection"\ +`splunkadmins_searchmessages_admin_1`\ +| `search_type_from_sid(sid)`\ +| eval type=case(from=="scheduler","scheduled",from=="SummaryDirector","acceleration",isnotnull(searchname),"dashboard",1=1,"ad-hoc")\ +| search `splunkadmins_searchmessages_admin_2`\ +| `base64decode(base64username)` \ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval report=coalesce(searchname,searchname2), app=coalesce(app,base64appname,app3), username=coalesce(username,base64username) \ +| fillnull app, username, report, message, orig_component value="N/A"\ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +| eval combined=message . type . orig_component . search_head_cluster\ +| cluster showcount=true field=combined t=0.90\ +| stats sum(cluster_count) AS count, latest(_time) AS _time, values(search_head_cluster) AS search_head_cluster, values(orig_component) AS orig_component, values(sid) AS search_ids by app, message, type\ +| eval search_ids=mvindex(search_ids,0,10)\ +| table count, app, message, _time, type, count, search_head_cluster, orig_component, search_ids\ +| append [ | makeresults | eval count=99999, app="N/A", message="cluster command in use, all apps/type/search head cluster may not be accurate. The type messages is the important point" | fields - _time ]\ +| sort - count, mostrecent +disabled = 1 + +[AllSplunkEnterpriseLevel - Splunkd Log Messages Admins Only] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 43 4 * * * +description = Chance the alert requires action? Moderate. This is an attempt to alert on almost any splunkd related log message which might be of interest to the admin team. Note that some items were excluded such as "SearchOperator:savedsplunk", while this exists in the splunkd log https://ideas.splunk.com/ideas/EID-I-796 advises why it is not useful as an error (vote if interested) +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `splunkenterprisehosts` `splunkadmins_splunkd_log_messages` \ +```OR (TERM(DateParserVerbose) TERM(consecutive)) was previously included in the splunkd.log section but the message about unretrievable data does not appear to be accurate, Splunk auto-increments the timestamp by 1s every 200K events so this is not an issue as such...``` \ + (sourcetype=splunkd (`splunkadmins_splunkd_source`) WARN OR ERROR MongoModificationsTracker OR TERM(SearchOperator:kv) OR AuditTrailManager OR IniFile OR GetBundleListTransaction OR GenericConfigKeyHandler OR AuthorizationManager OR GetRemoteAuthToken OR DistributedPeer OR (Archiver Permission) OR GetIndexListTransaction OR (DistributedPeerManager Timeout OR TERM(status=Down)) OR CalcFieldProcessor OR FieldAliaser OR (SearchScheduler OR DispatchManager "minimum free disk space") OR ApplicationUpdater OR (ScopedLDAPConnection NOT "Might indicate slow ldap server" NOT "Converting non-UTF-8 value to") OR regexExtractionProcessor OR (ProcessTracker NOT ConfMetrics) OR ConfReplication OR TailingProcessor OR "Invalid cron_schedule" OR "Persistent file" OR "Too many indexes" OR (UserManagerPro Strategy) OR SearchProcessMemoryTracker OR SSLOptions OR (SHCRepJob misspelled) OR PivotEvaluator OR PropertiesMap OR HTTPAuthManager OR X509Verify OR FilesystemChangeWatcher OR PropsKeyHandler OR IndexProcessor OR BundleArchiver OR (ApplicationManager NOT "Skipping update check for app id" NOT "This is expected if you push an app from the cluster master") OR ISplunkDispatch OR TcpInputConfig OR (CollectionConfHandler Bad OR reload) OR SLConstants OR TERM(AdminHandler:AuthenticationHandler) OR (DispatchManager NOT (failedtostart OR quota OR QUEUED OR concurrency OR concurrently)) OR KVStoreBulletinBoardManager OR CMRestIndexerDiscoveryHandler\ +OR KVStoreConfigurationProvider OR LMMasterRestHandler OR LMHttpUtil OR (DatabaseDirectoryManager Detecting) OR (No space NOT SHCRepJob NOT DispatchManager) OR (baseline configuration replicating) OR LMTracker OR IndexerDiscoveryHeartbeatThread OR ModularUtility OR ScriptRestHandler OR (component IN (S3Client, RetryableClientTransaction) (TERM(success=N) OR ("statusCode=") NOT TERM(statusCode=404) NOT TERM(retry=Y))) \ +OR WorkloadConfig OR "WARN loader" OR "ERROR loader" OR (TERM(AdminHandler:AuthenticationHandler) reasonable)\ +OR (KVStoreLookup OR KVStoreProvider OR SingleLookupDriver OR outputcsv OR TERM(SearchOperator:inputcsv) NOT "You have insufficient privileges" NOT "KV Store initialization" NOT "KV Store is shutting down" NOT "Found no results" NOT "lookup context" NOT "searchparsetmp" NOT "Invalid argument" NOT "must be followed by a search clause") OR ConfigEncryptor OR AesGcm\ +OR GenerationGrabber OR CMSearchHead OR DistHealthFetcher OR SpecFiles OR DeploymentServer OR DistributedPeerManagerHeartbeat OR MongodRunner OR (TERM(DS_DC_Common) NOT "attributes cannot be handled by WebUI" NOT "Attribute unsupported by UI") OR STMgr OR (heartbeat SHCSlave OR SHCMasterHTTPProxy OR failure) OR ServerInfoHandler OR BucketReplicator OR (TcpInputProc Stopping) OR StreamGroup \ +OR (ScriptRunner Killing OR stderr) OR LMStackMgr OR (DatabaseDirectoryManager corrupt) OR (BucketMover exited) OR ("KVStorageProvider" NOT "Result size too large" NOT "Too many rows in result") OR DistributedPeerManager OR (HttpClientRequest NOT "Broken pipe") OR (UserManagerPro NOT "Login failed" NOT "Failed to find ldapuser" NOT "Failed to get ldapuser") OR (component=AutoLoadBalancedConnectionStrategy NOT "Possible duplication" NOT "no raw data") OR AppsDeployHandler OR SHCConfig OR (ClusterMasterControlHandler NOT "No new dry run will be performed") OR RaftSimpleFileStorage \ +OR IConfCache OR (WorkloadManager NOT "Failed to select user provided workload_pool" NOT "trans") OR WorkloadClass OR AdminManagerExternal OR (SavedSearchAdminHandler NOT ("Unbalanced quotes", "Invalid cron_schedule", "Invalid search id, dispatch directory does not exist", "specifies a macro 'nix_app_index' that cannot be found", "Empty string is not a valid search string", "Cannot change user and/or app context of a report that is embedded")) OR JournalSlice OR PipelineComponent OR IndexConfig OR RawdataHashMarkReader OR ArchiveContext OR DateParser OR TimeoutHeap OR LMStackMgr OR AutoLookupDriver OR (TERM(spatial:PointInPolygonIndex) corruption) OR TERM(IntrospectionGenerator:resource_usage) OR PasswordHandler OR ConfigEncryptor OR AesGcm OR ModularInputs OR component IN (IndexerService,RetireOldS2S,UserManager,regexExtractionProcessor,Regex) OR (IndexingBundleLookupThread ```IndexingBundleLookupThread can occur when the transforms.conf has a kvstore but not the collection= so [kvdef] external_type = kvstore fields_list= ... is valid, but without collection= it can throw this error on 8.2.5, if updating via REST to /data/transforms/lookups include external_type/fields_list and collection= in the POST```) \ +OR (ChunkedExternProcessor ```Note ChunkedExternProcessor introduces noise as well as legitimate errors```) OR (SHCRepJob OR SHCMasterArtifactHandler Reason) OR (ExecProcessor message from NOT InsecureRequestWarning) OR (Crypto Decryption) OR (CacheManagerHandler failure) OR (component=ExecProcessor Errno OR Unexpected OR Expected OR Ignoring NOT InsecureRequestWarning) OR (ConfMetrics NOT "single_action=BASE_INITIALIZE" ```more research required on how of if these require tuning, but they likely relate to SHC issues``` ) \ +```included in others alerts: CMMasterProxy, AutoLoadBalancedConnectionStrategy (data duplication/timeouts), ExecProcessor?``` OR (DistributedBundleReplicationManager ```This is confirmed as an invalid warning message in Splunk 9``` NOT "Failed to touch bundle=, checksum=0 (manual preparation): No such file or directory") OR (SearchScheduler SearchProcessorException capability) OR (DispatchManager sufficient) OR (SearchScheduler sufficient) OR BundlesUtil OR AwsCredentials OR CMBundleStreamHandler OR (CMMaster Cannot) OR "fd limit" \ +OR (component=SearchProcessRunner NOT "RequireProcessor" NOT "hung up" NOT (log_level=WARN code=111 OR exit=111) NOT (log_level=ERROR "caught exception") ```the following are not considered an issue WARN SearchProcessRunner [37354 PreforkedSearchesManager-0] - preforked process=0/1607321 with search=0/2039381 exited with code=111, ERROR SearchProcessRunner [37354 PreforkedSearchesManager-0] - preforked search=0/2039381 on process=0/1607321 caught exception. completed_searches=2, process_started_ago=15.511, search_started_ago=6.788, search_ended_ago=0.000, total_usage_time=10.580, ERROR SearchProcessRunner [37354 PreforkedSearchesManager-0] - preforked process=0/1607321 died on exception (exit code=111): Error in 'RequireProcessor': The 'require' command received zero events or results; the search will be intentionally stopped``` )\ +OR component=Saml OR component=FileClassifierManager OR component=HttpPubSubConnection OR component=KVStoreBackupRestore OR component=TelemetryHandler OR component=AdminManagerValidation OR component IN (RfsDestination, RfsOutputProcessor) OR component IN (AuthenticationManagerSplunk, RetireOldS2S, JsonWebTokenHandle, AwsSDK, IndexerIf, Application) OR "exited with status code" OR "Error in 'script'" OR "Script execution failed" OR (component=JsonWebToken NOT "Token signature was valid, but could not find token") \ +``` this is covered by "SearchHeadLevel - KVStore Or Conf Replication Issues Are Occurring" as well ``` OR component=ConfReplicationThread OR (component=DiskMon AND log_level=ERROR) ``` this can be a little bit noisy, if related to the indexers perhaps more eviction padding will help? ``` OR (component=SHCMasterHTTPProxy "captain as down") OR component=ServerInfoHandler OR component=SHCConfig OR "active replication count >= max_peer_rep_load" OR (component=SearchScheduler NOT "maximum disk usage quota" NOT "based on their role quota" NOT "Alert script execution failed" NOT "Alert script returned error code" ``` these last two should be covered by other alerts```) OR "Application does not exist" OR "account has expired" OR "You do not have a role" OR component=JsonWebTokenHandler OR component=SearchLogCopier OR component=BulletinBoard OR component=RfsOutputProcessor* ``` note this can be missed with the shutdown macro ``` OR setManualDetention OR component=InstalledFilesHashChecker OR component=PropertiesMap OR component=TcpOutputFd OR component IN (HTTPServer, HttpInputServer) OR (component=HandleJobsDataProvider "exceeds") OR component=LoadLDAPUsersThread \ + NOT ("Configuration from app" "does not support reload") ```This is a harmless error message, tsidx is optimized after this error appears``` ```txn close did not succeed completely while flushing and closing a tsidx file rc=-8. Can be self-repaired in some cases but not all, so you may need to check on the bucket to see if it's an issue. It can relate to large >20MB+ events with slower IO for example``` \ + NOT "Rounded off to 100% to handle the interval drift" ) NOT ("CacheManager Cannot determine amount of free space for partition of dir" "No such file or directory") NOT ("S2SFileReceiver" "No such file or directory") NOT ("KVStorageProvider" "Insert data failed" "already exists") NOT ("SearchOperator:inputcsv" "might contain invalid operators") NOT ("INFO" "BucketReplicator" "successful" OR "Starting replication of bucket" OR "event=finishBucketReplication" OR "event=localReplicationFinished" OR "event=replicationFinished" OR "event=startBucketReplication") NOT ("INFO" "SpecFiles" "Found external scheme definition for stanza") \ + NOT ("INFO" "IndexProcessor" "removing replication target temp") NOT ("INFO" "ModularInputs" "Endpoint argument settings for") \ +```these may require more investigation. Ignoring for now Aug 2022``` NOT ("ERROR CacheManager" "No such file or directory") NOT ("ERROR BucketReplicator" "The bucket may have frozen") NOT ("BucketReplicator" "Failed to check the hotness of bucketId") \ + OR (sourcetype=scheduler source=*scheduler.log AlertNotifier WARN) \ + OR (sourcetype=splunkd (`splunkadmins_splunkd_source`) INFO (IndexWriter paused ```May relate to maxConcurrentOptimizes in indexes.conf or perhaps maxRunningProcessGroups or spikes in data-per indexer```) OR (component=HotDBManager "unflushed buckets") OR (TERM(event=reclaimMemory) IndexProcessor OR StreamingBucketBuilder ```May relate to memPoolMB / maxMemMB setting in indexes.conf or the IndexWriter getting paused. However data balance (too much MB/s of ingestion on a single indexer/uneven balance appears to cause this too)```)) \ +| search ```ignore shutdown times to remove errors that relate to shutdowns, note this may remove some legitimate alerts as well``` AND NOT [ `splunkadmins_shutdown_time_by_period(splunkenterprisehosts,60,60,10)` ] \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| search ```Exclude time periods where shutdowns were occurring. While this makes the alert less nosiy it removes some legitimate errors too``` NOT \ + [ `splunkadmins_shutdown_time_by_shc(searchheadhosts,60,60)`] \ +| eval message=coalesce(message,event_message) \ +| rex mode=sed field=message "s/^\([^\)]+\)\s+(ProcessTracker\s+-\s+)?(\([^\)]+\)\s+)?IndexConfig/IndexConfig/g" \ +| rex mode=sed field=message "s/^sid:[^ ]+//g" \ +| rex mode=sed field=message "s/snapshot:\s+[^;]+;\s+Configurations changed while generating snapshot, original_latest_change=[^,]+, new_latest_change=[^,]+/snapshot: <bundledir> Configurations changed while generating snapshot original_latest_change=<removed>, new_latest_change=<removed>/" \ +| rex mode=sed field=message "s/Error getting modtime:\s+[^:]+/Error getting modtime: <dir>/g" \ +| rex field=message mode=sed "s/uri=(https?:\/\/([^\/]+\/){4})\S+/uri=\1/" \ +| rex field=message mode=sed "s/(<Resource>(\/[^\/]+){3}\/)[^<]+/\1/" \ +| rex field=message mode=sed "s/<RequestId>[^<]+<\/RequestId>/<RequestId>removed<\/RequestId>/" \ +| rex field=message mode=sed "s/transactionId=\S+\s+rTxnId=\S+/transactionId=removed rTxnId=removed/" \ +| rex field=message mode=sed "s/snapshot exists at op_id=\S+/snapshot exists at op_id=removed/" \ +| rex field=message mode=sed "s/(search_id=\"[^_]+_+[^_]+)[^\"]+/\1/" \ +| rex field=message mode=sed "s/bid=\S+/bid=?/" \ +| rex field=message mode=sed "s/JSON parse error at offset \d+ of file \".*? Unexpected/JSON parse error at offset <x> of file: Unexpected/" \ +| rex field=message mode=sed "s/Possible duplication of events with channel=.*?,\s+.*?host=/Possible duplication of events with channel=removed_by_sed host=/" \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| stats count, latest(_time) AS mostrecent, earliest(_time) AS firstseen, values(component) AS component, values(log_level) AS log_level by message, search_head_cluster \ +| search NOT (component IN (TcpOutputFd, AutoLoadBalancedConnectionStrategy) count<3) NOT (component IN (S3Client) (message="*Read Timeout*" OR message="*statusCode=500*") count<3) \ +| eval comb_message = log_level . " " . component . " " . message \ +| eval mostrecent=strftime(mostrecent, "%+"), firstseen=strftime(firstseen, "%+") \ +| table comb_message, search_head_cluster, count, mostrecent, firstseen \ +| cluster field=comb_message showcount=true t=0.9 \ +| fields - cluster_label \ +| sort comb_message, cluster_count +disabled = 1 + +[DeploymentServer - Error Found On Deployment Server] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 14 */4 * * * +description = Chance the alert requires action? Moderate. An application was not found or another deployment server error has occurred, this is more generic than the specific DeploymentServer - alerts +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This usually indicates a misconfigured serverclass.conf or a missing application from the deployment-apps directory``` \ +index=_internal `deploymentserverhosts` "ERROR Serverclass" OR "ERROR DSManager" OR ("WARN DeploymentServer") OR CASE(" FATAL ") OR (TERM(DS_DC_Common) NOT "attributes cannot be handled by WebUI") sourcetype=splunkd (`splunkadmins_splunkd_source`) \ +| cluster showcount=true \ +| table _time, _raw, cluster_count +disabled = 1 + +[SearchHeadLevel - Splunk alert actions exceeding the max_action_results limit] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 14 */4 * * * +description = Chance the alert requires action? Moderate. One or more alerts exceeeded the max_action_results set in limits.conf, if the max_action_results is exceeded the alert action receives only part of the results to work with, this can be a problem with the lookup alert action or others...Note that there is no log entry for this in splunkd as of 8.1.1, refer to https://ideas.splunk.com/ideas/EID-I-781 to vote on having log messages for this issue +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```If the max_action_results is exceeded the alert action receives only part of the results to work with, this can be a problem with the lookup alert action or others...``` index=_internal `searchheadhosts` sourcetype=scheduler alert_actions!="" ``` summary indexing shows up as an alert action but uses | summaryindex, this should not be an issue ``` alert_actions!="summary_index" `splunkadmins_alertactions_max_action_results`\ + [| rest /services/configs/conf-limits `splunkadmins_restmacro` f=title f=max_action_results\ + | search title=scheduler\ + | eval search="result_count>" . max_action_results\ + | fields search]\ +| stats count, values(alert_actions) AS alert_actions, earliest(_time) AS firstSeen, latest(_time) AS lastSeen, max(result_count) AS result_count by user, app, savedsearch_name \ +| append \ + [| rest /services/configs/conf-limits `splunkadmins_restmacro` f=title f=max_action_results \ + | search title=scheduler \ + | fields max_action_results ] \ +| eventstats max(max_action_results) AS max_action_results \ +| eval firstSeen = strftime(firstSeen, "%+"), lastSeen=strftime(lastSeen, "%+") \ +| where isnotnull(count) \ +| appendpipe \ + [| map search="| rest /servicesNS/$user$/$app$/saved/searches `splunkadmins_restmacro` | search title=\"$savedsearch_name$\" \ + | eval savedsearch_name=\"$savedsearch_name$\", app=\"$app$\", user=\"$user$\"\ + | table actions, action.*, savedsearch_name, app, user" maxsearches=20\ + ]\ +| stats values(*) AS * by savedsearch_name, app, user\ +| eval remove=case('action.email'="1" AND isnull(action.email.sendresults),"remove",1=1,null())\ +| where isnull(remove)\ +| eval message="One or more of your alerts are attempting to use an alert action with a number of events/results that exceeds the max_action_results limit, Splunk will truncate results beyond the max_action_results limit listed in the table when running the alert action..." \ +| table message, user, app, savedsearch_name, alert_actions, result_count, max_action_results, count, firstSeen, lastSeen +disabled = 1 + +[SearchHeadLevel - authorize.conf settings will prevent some users from appearing in the UI] +action.email.reportServerEnabled = 0 +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 43 4 * * * +description = Chance the alert requires action? Moderate. this alert attempts to find a list of roles that have capabilities the admin (or roles inheriting) the admin role do not have. The issue with this is that the Settings -> Users UI page, or in the /services/authentication/users REST endpoint will not show users *if* the grantableRoles setting is used on that particular role. Since this setting can be set by the UI itself it an issue can occur that some users do not appear in Settings -> Users but are cached by Splunk correctly, you just cannot see them. \ +The page https://docs.splunk.com/Documentation/Splunk/latest/Admin/authorizeconf descrbies the grantableRoles setting in more detail, this is definitely an edge case but it may be worth detecting... +dispatch.earliest_time = -1d@h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.statistics.drilldown = row +display.visualizations.charting.chart = line +display.visualizations.show = 0 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/authorization/roles `splunkadmins_restmacro` \ +| eval comb_capabilities=mvappend(imported_capabilities,capabilities) \ +| where (title="admin" OR imported_roles="admin") AND isnotnull(grantable_roles) \ +| eval isadminrole="true" \ +| stats count by comb_capabilities, title, isadminrole \ +| rename count AS admin_count \ +| rename comb_capabilities AS capabilities \ +| append \ + [| rest /servicesNS/-/-/authorization/roles `splunkadmins_restmacro` \ + | where (title!="admin" AND imported_roles!="admin") OR isnull(grantable_roles) \ + | stats count by capabilities, title ] \ +| fillnull isadminrole value="false" \ +| stats count, values(isadminrole) AS isadminrole, values(title) AS title, max(admin_count) AS admin_count by capabilities \ +| eventstats max(admin_count) AS admin_count \ +| where isadminrole="false" AND NOT isadminrole="true" AND admin_count>0 \ +| stats values(capabilities) AS capabilities by title \ +| rename title AS role \ +| eval comment="If the mentioned roles are granted to zero or more users, then the users will no longer be visible in the Settings -> Users UI page, or in the /services/authentication/users REST endpoint due to the grantableRoles setting as per https://docs.splunk.com/Documentation/Splunk/latest/Admin/authorizeconf. Therefore you may wish to either remove the grantableRoles setting from the mentioned admin role(s) or alternatively add additional inherited roles/capabilities into the mentioned admin role(s) to ensure the users are visible in the mentioned REST endpoint / UI page. If no users have this role ignore this message..." \ +| table comment, role, capabilities \ +| search `splunkadmins_authorize_conf_prevent_users` +disabled = 1 + +[IndexerLevel - RemoteSearches Indexes Stats Wilcard] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This is an example of using the remote_searches.log on the indexers to determine which indexes are in use, this version matches wildcards and will be inaccurate due to the lack of role information per user (the audit logs SearchHeadLevel - Search Queries summary non-exact match will work better for that purpose. This example search was to check if an index is ever accessed via wildcards. Note this report requires SearchHeadLevel - Index list by cluster report, to run / output a lookup. Note that this search utilises the streamfilterwildcard custom search command included in the TA-Alerts for SplunkAdmins application on SplunkBase (or github) +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```Attempt to determine index access via the remote_searches.log file, useful for when you cannot see the audit logs of all incoming search heads. This version looks for wildcards and it is not expected to be super-accurate, as while we can determine the incoming server, and sometimes the incoming user from the search id we cannot accurately determine the roles of the user without building yet more lookups and complexity. Therefore this search exists only to roughly summarize if an index was ever accessed via wildcards or not at the indexing tier``` \ + index=_internal sourcetype=splunkd_remote_searches source="/opt/splunk/var/log/splunk/remote_searches.log" terminated: OR closed: ```Note that TERM(starting) has the apiStartTime, apiEndTime stats, but lacks the useful stats from a search that is complete. Also note that on indexers scan_count=events_count (in my testing). Finally the elapsedTime sometimes failed to auto-extract, perhaps due to length...``` \ +| rex "(?s) elapsedTime=(?P<elapsedTime>[0-9\.]+), search='(?P<search>.*?)(', savedsearch_name|\", drop_count=\d+)" \ +| regex search!="^(pretypeahead|copybuckets)" \ +| rex "drop_count=[0-9]+, scan_count=(?P<scan_count>[0-9]+)" \ +| rex "total_slices=[0-9]+, considered_buckets=(?P<considered_count>[0-9]+)" \ +| rex "(,|}\.\.\.) savedsearch_name=\"(?P<savedsearch_name>[^\"]*)\"," \ +| rex "(terminated|closed): search_id=(?P<search_id>[^,]+)" \ +| regex search="^(litsearch|mcatalog|mstats|mlitsearch|litmstats|tstats|presummarize)" \ +| rex field=search max_match=50 "(?s)\|?\s*(mlitsearch)\s+.*?\[(?P<subsearch>.*?)\]\s*(\||$)" \ +| rex field=search "(?s)(?P<prepipe>\s*\|?([^\|]+))" \ +| nomv subsearch \ +| eval subsearch=if(isnull(subsearch),"",subsearch) \ +| eval prepipe = prepipe . " " . subsearch \ +| eval search=prepipe \ +| rex mode=sed field=search "s/search index=\s*\S+\s+index\s*=/search index=/g" \ + ```Extract out index= or index IN (a,b,c) but avoid NOT index in (...) and NOT index=... and also NOT (...anything) statements \ + The (index=* OR index=_*) index=<specific index> is a common use case for enterprise security, also some individuals like doing a similar trick so remove the index=*... as this is not a wildcard index search``` \ +| rex field=search "(?P<esstylewildcard>\(\s*index=_\*\s+OR\s+index=\*\s+\))" \ + ```Extract out index= or index IN (a,b,c) but avoid NOT index in (...) and NOT index=... and also NOT (...anything) statements``` \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex2>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "\s+(?P<skipping>\.\.\.\{skipping \d+ bytes\}\.\.\.)" \ + ```If skipping is in the logs as in index=abc- ...{skipping 46464 bytes}..., then drop the last index found in the regex as it is likely invalid``` \ +| eval indexregex=if(isnotnull(skipping),mvindex(indexregex,0,-2),indexregex) \ +| eval indexregex2=if(isnotnull(skipping),mvindex(indexregex2,0,-2),indexregex2) \ +| eval indexes=mvappend(indexregex,indexregex2) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval multi=if(mvcount(mvdedup(indexes))>1,"true","false") \ +| eval short=mvmap(indexes,if(len(indexes)<=3,"True",null())) \ +| eval short=if(isnull(short),"False","True") \ +| rex field=search_id "^remote_(?P<sid>.*)" \ +| rex "search_id=[^,]+,\s+server=(?P<server>[^,]+)" \ +| eval server_with_underscore = server. "_" \ +| eval sid=replace(sid, server_with_underscore, "") \ +| eval search_head=server \ +| `search_type_from_sid(sid)` \ +| `base64decode(base64username)` \ +| eval username3="unknown" \ +| eval user=coalesce(username, base64username, username3) \ +| rex field=search "^(?P<presummarize>presummarize)\s+" \ +| eval type=if(isnotnull(presummarize),"acceleration",type) \ +| eval search_head_cluster=`search_head_cluster` \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ + ```If you use the TERM(starting) you get the apiStartTime/apiEndTime, or you could join them in stats or similar...however this works to obtain which indexes are used. Note that you would need to build something similar to 'SearchHeadLevel - Search Queries summary non-exact match' to be able to translate the wildcards into something more useful, but there would be a lot of guesswork involved if you do not have usernames+server names+roles...(which is why audit logs work better for this)``` \ +| rex "search_rawdata_bucketcache_error=[^,]+, search_rawdata_bucketcache_miss=(?P<cache_rawdata_miss>[^,]+), search_index_bucketcache_error=[^,]+, search_index_bucketcache_hit=(?P<cache_index_hit>[^,]+), search_index_bucketcache_miss=(?P<cache_index_miss>[^,]+), search_rawdata_bucketcache_hit=(?P<cache_rawdata_hit>[^,]+), search_rawdata_bucketcache_miss_wait=(?P<cache_rawdata_miss_wait>[^,]+), search_index_bucketcache_miss_wait=(?P<cache_index_miss_wait>[^,]+)" \ +| `base64decode(base64appname)` \ +| eval app3="N/A" \ +| eval app=coalesce(app,base64appname,app3) \ +| stats dc(search_id) AS count, avg(elapsedTime) AS avg_total_run_time, max(elapsedTime) AS max_total_run_time, median(elapsedTime) AS median_total_run_time, avg(scan_count) AS avg_scan_count, max(scan_count) AS max_scan_count, min(scan_count) AS min_scan_count, median(scan_count) AS median_scan_count, sum(cache_rawdata_miss) AS cache_rawdata_miss, sum(cache_index_hit) AS cache_index_hit, sum(cache_index_miss) AS cache_index_miss, sum(cache_rawdata_hit) AS cache_rawdata_hit, sum(cache_rawdata_miss_wait) AS cache_rawdata_miss_wait, sum(cache_index_miss_wait) AS cache_index_miss_wait by user, search_head_cluster, indexes, indexer_cluster, type, multi, short, app \ +| regex indexes="\*" \ +| eval indexes=lower(indexes) \ +| lookup splunkadmins_indexlist_by_cluster indexer_cluster \ +| makemv index tokenizer=(\S+) \ +| streamfilterwildcard pattern=indexes fieldname=indexes index \ +| makemv indexes tokenizer=(\S+) \ +| stats sum(count) AS count, avg(avg_total_run_time) AS avg_total_run_time, max(max_total_run_time) AS max_total_run_time, median(median_total_run_time) AS median_total_run_time, avg(avg_scan_count) AS avg_scan_count, max(max_scan_count) AS max_scan_count, min(min_scan_count) AS min_scan_count, median(median_scan_count) AS median_scan_count, sum(cache_rawdata_miss) AS cache_rawdata_miss, sum(cache_index_hit) AS cache_index_hit, sum(cache_index_miss) AS cache_index_miss, sum(cache_rawdata_hit) AS cache_rawdata_hit, sum(cache_rawdata_miss_wait) AS cache_rawdata_miss_wait, sum(cache_index_miss_wait) AS cache_index_miss_wait by indexes, indexer_cluster, user, search_head_cluster, type, multi, short, app \ +| eval prefix="platform_stats.remote_searches.per_index.nonexact." \ +| addinfo \ +| rename info_max_time AS _time \ +| fields - info_* \ + ```| mcollect index=a_metrics_index split=true prefix_field=prefix search_head_cluster, indexer_cluster, type, user, indexes, multi, short, app. Below is useful if you instead use summary indexing for metrics in newer Splunk versions...in Splunk 8.0.x delete the below lines``` \ +| rename * AS platform_stats.remote_searches.per_index.nonexact.* \ +| rename platform_stats.remote_searches.per_index.nonexact.search_head_cluster AS search_head_cluster platform_stats.remote_searches.per_index.nonexact.indexer_cluster AS indexer_cluster, platform_stats.remote_searches.per_index.nonexact.type AS type, platform_stats.remote_searches.per_index.nonexact.user AS user, platform_stats.remote_searches.per_index.nonexact.indexes AS indexes, platform_stats.remote_searches.per_index.nonexact.multi AS multi, platform_stats.remote_searches.per_index.nonexact.short AS short, platform_stats.remote_searches.per_index.nonexact.app AS app \ +| fields - prefix + +[SearchHeadLevel - Index list by cluster report] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. This report outputs a list of indexes available per indexer cluster. Used by other reports such as IndexerLevel - RemoteSearches Indexes Stats Wilcard +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | eventcount summarize=false index=* OR index=_* \ +| eval indexer_cluster=`indexer_cluster_name(server)` \ +| stats count by index, indexer_cluster \ +| fields - count \ +| outputlookup splunkadmins_indexlist_by_cluster + +[IndexerLevel - SmartStore - Bucket cache errors audit logs] +action.keyindicator.invert = 0 +alert.suppress = 1 +alert.suppress.period = 3h +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 18 */4 * * * +description = Chance the alert requires action? Moderate. The audit logs from the search tier are advising of 1 or more bucket cache errors +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" invocations_command_search_index_bucketcache_error>0 OR invocations_command_search_rawdata_bucketcache_error>0 \ +| eval invocations_command_search_hit=invocations_command_search_index_bucketcache_hit + invocations_command_search_rawdata_bucketcache_hit, invocations_command_search_miss = invocations_command_search_index_bucketcache_miss + invocations_command_search_rawdata_bucketcache_miss \ +| table _time, invocations_command_search_index_bucketcache_error, invocations_command_search_rawdata_bucketcache_error, total_run_time, user, has_error_msg, invocations_command_search_hit, invocations_command_search_miss, duration_command_*_miss, search_id +disabled = 1 + +[IndexerLevel - RemoteSearches find all time searches] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. The remote_searches.log is showing that there may be an all time search running on the indexing tier, this may or may not be an issue. Note this can also be detected via the audit.log should you have access to the audit.log of all search heads. This log does miss the scenario where _index_earliest is passed via API as per the comments on https://ideas.splunk.com/ideas/E-I-49 . Note that you probably want to run this on a single indexer... +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +display.visualizations.charting.chart = area +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `splunkadmins_indexer_remotesearches_alltime` source=*remote_searches.log sourcetype=splunkd_remote_searches StreamedSearch TERM(starting:) NOT TERM(terminated:) "search='litsearch" OR "search='mlitsearch" OR "search='mcatalog" OR "search='mstats" OR "search='mlitsearch" OR "search='litmstats" OR "search='tstats" OR "search='presummarize" NOT "litsearch index=mobieos | fields" \ +| regex search!="^presummarize (tstats=t )?maintain=\"" \ +| eval start_time=strptime(apiStartTime, "%a %b %d %H:%M:%S %Y") \ +| eval start_time=if(apiStartTime="ZERO_TIME","ZERO_TIME",start_time) \ +| eval now=now() \ +| where start_time<(now-31622400) OR start_time="ZERO_TIME" \ +| rex field=search "earliest=(?P<earliest_time_field>\S+)" \ +| eval earliest_time2=if(isnotnull(earliest_time_field),earliest_time_field,"-1s") \ +| eval start_time_relative=relative_time(now(), earliest_time2) \ +| eval diff=now() - start_time_relative \ +| where diff>31622400 OR isnull(earliest_time_field) \ +| eval start_time=if(isnotnull(earliest_time_field),strftime(start_time_relative, "%a %b %d %H:%M:%S %Y"),apiStartTime) \ +| stats latest(_time) AS _time, values(search) AS search, values(savedsearch_name) AS savedsearch_name, values(start_time) AS start_time values(apiStartTime) AS apiStartTime, values(apiEndTime) AS apiEndTime by search_id, server \ +| table _time, server, search_id, apiStartTime, start_time, savedsearch_name, search, apiEndTime + +[SearchHeadLevel - Accelerated DataModels with wildcard or no index specified] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 0 7 * * * +description = Chance the alert requires action? High. An accelerated data model searching over all indexes or many indexes can be a a minor issue for the Splunk indexing tier, or a major issue if using smartstore on the indexers in combination with a large number of indexes...the receipt files are stored per-index and each indexer will query for them on deletion (a request flood). Search Head specific? Yes +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/datamodel/model `splunkadmins_restmacro` search=eai:type=datamodel f=description f=acceleration f=eai:acl* \ +| spath input=description path=objects{}.objectSearch output=objectSearch \ +| spath input=acceleration path=enabled output=accelerationEnabled \ +| where accelerationEnabled!="false" \ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing \ +| table splunk_server, title, app, sharing, objectSearch, accelerationEnabled, updated \ +| mvexpand objectSearch \ +| rex field=objectSearch mode=sed "s/\(index=\* OR index=_\*\)/indexwildcard/g" \ +| rex field=objectSearch "(?P<index>index(\s*=\s*\S+|\s+IN\s+\([^\)]+))" \ +| where isnull(index) OR match(index, "\*") \ +| stats values(objectSearch) AS objectSearch by splunk_server, title, index, app, sharing, accelerationEnabled, updated +disabled = 1 + +[IndexerLevel - RemoteSearches find datamodel acceleration with wildcards] +action.email.useNSSubject = 1 +alert.track = 0 +description = Report only? Yes. The remote_searches.log is showing that an accelerated datamodel appears to be using an index=* wildcard, when using SmartStore this can cause serious issues with object store request numbers https://ideas.splunk.com/ideas/EID-I-677 +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host"] +display.events.list.drilldown = none +display.events.list.wrap = 0 +display.events.maxLines = 100 +display.events.raw.drilldown = none +display.events.rowNumbers = 1 +display.events.table.drilldown = 0 +display.general.type = statistics +display.page.search.mode = fast +display.page.search.tab = statistics +display.statistics.drilldown = none +display.statistics.wrap = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` source=*remote_searches.log sourcetype=splunkd_remote_searches source="/opt/splunk/var/log/splunk/remote_searches.log" StreamedSearch TERM(starting:) NOT TERM(terminated:) search_id=*SummaryDirector* \ +| regex "index=\"?\*" \ +| rex mode=sed field=search "s/\(\s*index=\*\s+OR\s+index=_\*\s*\).*?index(=|\s+IN\s+\()/ESstylewildcard index=/g" \ +| regex search="index=\"?\*" \ +| rex "search_id=[^,]+,\s+server=(?P<search_head>[^,]+)" \ +| eval search_head_cluster=`search_head_cluster` \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| stats count, values(search) AS search by search_head_cluster, indexer_cluster + +[IndexerLevel - IndexWriter pause duration] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report will measure the period of time that is mentioned as "paused" by the IndexWriter and then "Released" from the throttle +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` `indexerhosts` INFO (IndexWriter paused) OR Released \ +| eval state=if(searchmatch("Released indexing throttle"),"stopped","started") \ +| sort 0 _time \ +| streamstats current=f global=f window=1 values(state) AS prev_state, latest(_time) AS start by host \ +| streamstats current=f global=f window=1 values(bucket) AS bucket by host, idx \ +| search state="stopped" AND prev_state="started" \ +| eval duration=_time-start \ +| table host, idx, duration, _time, start, bucket \ +| eval start=strftime(start, "%Y-%m-%d %H:%M:%S.%3N") + +[SearchHeadLevel - platform_stats.users savedsearches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure ... +dispatch.earliest_time = -24h@h +dispatch.latest_time = @h +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches timeout=900 `splunkadmins_restmacro` f=title f=is_scheduled f=eai:acl* f=disabled \ +| eval app='eai:acl.app', user='eai:acl.owner', search_head=splunk_server \ +| eval search_head_cluster=`search_head_cluster` \ +| eval scheduled=case(disabled==1,0,disabled==0 AND is_scheduled==1,1,1=1,0) \ +| stats count by search_head_cluster, user, scheduled, app\ +| eval prefix="user_stats.savedsearches." \ +| eval _time=now() \ +| fields - info_*\ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head_cluster, user, scheduled, app``` + +[SearchHeadLevel - platform_stats.users dashboards] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to measure ... +dispatch.earliest_time = -24h@h +dispatch.latest_time = @h +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/ui/views timeout=900 `splunkadmins_restmacro` f=title f=eai:acl*\ +| eval app='eai:acl.app', user='eai:acl.owner', search_head=splunk_server \ +| eval search_head_cluster=`search_head_cluster` \ +| stats count by search_head_cluster, user, app\ +| eval prefix="user_stats.dashboards." \ +| eval _time=now() \ +| fields - info_* \ + ```mcollect index=a_metrics_index split=true prefix_field=prefix search_head_cluster, user, app``` + +[AllSplunkLevel - No recent metrics.log data] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 53 * * * * +description = Chance the alert requires action? High. If the metrics.log disappears for a period of time either the indexing tier is very busy or the forwarder in question has failed and stopped sending metrics.log files +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | tstats prestats=t count where index=_internal `splunkenterprisehosts` `splunkadmins_metrics_source` by host, _time span=5m \ +```This alert attempts to detect when a forwarder or Splunk server stops sending logs for an extended period of time outside a shutdown...```\ +| search ```Exclude the shutdown times``` NOT \ + [ `splunkadmins_shutdown_list(splunkenterprisehosts,30,30)`] \ +| timechart limit=0 aligntime=latest span=5m count by host \ +| fillnull \ +| untable _time, host, count \ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, last(count) AS lastCount by host \ +| where lastCount=0 \ +| eval logMessages="Zero log entries found at this time, check that the Splunk server is still running/working as expected" \ +| fields - lastCount \ +| eval mostRecent = strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+") +disabled = 1 + +[SearchHeadLevel - SmartStore cache misses - dashboards] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. This report is designed to find the number of cache misses by dashboards, originally created by Nico Van Der Walt +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = `searchheadhosts` (index=_audit action=search search_id NOT typeahead NOT "search_id='rsa_*") ```Original version by Nico Van Der Walt, modified by Gareth Anderson``` \ +invocations_command_search_index_bucketcache_miss>0 OR invocations_command_search_rawdata_bucketcache_miss>0 TERM(info=*) TERM(UI:Dashboard:*) \ +| eval total_days_searched=(search_lt-search_et)/86400 \ +| eval total_hours_searched=total_days_searched*24 \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| eval total_days_searched=round(total_days_searched,0) \ +| eval search_id=trim(search_id,"\'") \ +| eval search_id=coalesce(search_id,sid) \ +| eval origSid=search_id \ +| rex field=search_id "subsearch_(?<search_id>.*)_\d+\.\d+" \ +| eval api_et=if(api_et="N/A", search_et, api_et) \ +| eval total_hours_searched=if(api_et="N/A", "AllTime",total_hours_searched) \ +| eval total_days_searched=if(api_et="N/A", "AllTime",total_days_searched) \ +| eval provenance=if(provenance="N/A",NULL,provenance) \ +| eval provenance=if(provenance="UI:LocateData",NULL,provenance) \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| eval search=if(match(search,"^'"),mvindex(search,1),search) \ +| stats latest(_time) AS mostRecent, values(host) as host sum(duration_command_search_rawdata_bucketcache_miss) AS duration__raw_cache_miss sum(invocations_command_search_index_bucketcache_miss) as count_index_cache_miss sum(invocations_command_search_rawdata_bucketcache_miss) as count_rawdata_cache_miss values(total_hours_searched) AS total_hours_searched values(total_days_searched) AS total_days_searched values(user) AS users last(search) AS search values(savedsearch_name) AS savedsearch_name max(total_run_time) AS run_time values(result_count) AS result_count values(event_count) AS event_count values(searched_buckets) AS searched_buckets values(info) AS info values(provenance) AS provenance dc(origSid) AS numofsearchesinquery by search_id \ +| `search_type_from_sid(search_id)` \ +| `base64decode(base64appname)` \ +| eval app3="N/A", app=coalesce(app,app2,base64appname,app3) \ +| eval total_cache_miss=count_index_cache_miss+count_rawdata_cache_miss \ +| search total_cache_miss>0 \ +| search provenance=*Dashboard* \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| eval has_pipe=if(match(search,"\|"),"true",null()) \ +| rex field=search "(?P<search>[^\|]+\|)" \ +| eval search = if(isnotnull(has_pipe),search . " ... (trimmed)",search)\ +| stats latest(mostRecent) AS mostRecent count as number_of_searches_run dc(savedsearch_name) as num_panels values(host) as host max(run_time) AS max_run_time avg(run_time) AS avg_run_time sum(run_time) AS sum_run_time sum(total_cache_miss) as total_cache_miss sum(result_count) AS result_count sum(event_count) AS event_count sum(searched_buckets) AS searched_buckets values(users) as users, values(indexes) AS indexes, values(search) AS search, values(info) AS info by provenance \ +| eval avg_run_time=round(avg_run_time,1) \ +| eval provenance=replace(provenance, "(?i)UI:Dashboard:", ""), mostRecent=strftime(mostRecent,"%+") \ +| rename provenance as dashboard \ +| sort - total_cache_miss + +[SearchHeadLevel - SmartStore cache misses - savedsearches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. This report is designed to find the number of cache misses by saved searches, originally created by Nico Van Der Walt +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = `searchheadhosts` (index=_audit action=search search_id NOT typeahead NOT "search_id='rsa_*") ```Original version by Nico Van Der Walt, modified by Gareth Anderson```\ +invocations_command_search_index_bucketcache_miss>0 OR invocations_command_search_rawdata_bucketcache_miss>0 TERM(info=*) TERM(UI:Search) \ +| eval total_days_searched=(search_lt-search_et)/86400 \ +| eval total_hours_searched=total_days_searched*24 \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| eval total_days_searched=round(total_days_searched,0) \ +| eval search_id=trim(search_id,"\'") \ +| eval search_id=coalesce(search_id,sid) \ +| eval origSid=search_id \ +| rex field=search_id "subsearch_(?<search_id>.*)_\d+\.\d+" \ +| eval api_et=if(api_et="N/A", search_et, api_et) \ +| eval total_hours_searched=if(api_et="N/A", "AllTime",total_hours_searched) \ +| eval total_days_searched=if(api_et="N/A", "AllTime",total_days_searched) \ +| eval provenance=if(provenance="N/A",NULL,provenance) \ +| eval provenance=if(provenance="UI:LocateData",NULL,provenance) \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| eval search=if(match(search,"^'"),mvindex(search,1),search) \ +| stats latest(_time) AS mostRecent, values(host) as host sum(duration_command_search_rawdata_bucketcache_miss) AS duration__raw_cache_miss sum(invocations_command_search_index_bucketcache_miss) as count_index_cache_miss sum(invocations_command_search_rawdata_bucketcache_miss) as count_rawdata_cache_miss values(total_hours_searched) AS total_hours_searched values(total_days_searched) AS total_days_searched values(user) AS users last(search) AS search values(savedsearch_name) AS savedsearch_name max(total_run_time) AS run_time values(result_count) AS result_count values(event_count) AS event_count values(searched_buckets) AS searched_buckets values(info) AS info values(provenance) AS provenance dc(origSid) AS numofsearchesinquery by search_id \ +| eval total_cache_miss=count_index_cache_miss+count_rawdata_cache_miss \ +| search total_cache_miss>0 \ +| search provenance=UI:Search \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| `search_type_from_sid(search_id)` \ +| `base64decode(base64appname)` \ +| eval app3="N/A", app=coalesce(app,app2,base64appname,app3) \ +| stats latest(mostRecent) AS mostRecent, count as number_of_runs values(host) as host values(total_hours_searched) AS total_hours_searched values(total_days_searched) AS total_days_searched max(run_time) AS max_run_time avg(run_time) AS avg_run_time sum(run_time) AS sum_run_time sum(total_cache_miss) as total_cache_miss max(result_count) AS result_count max(event_count) AS event_count max(searched_buckets) AS searched_buckets values(info) AS info values(numofsearchesinquery) AS numofsearchesinquery, values(app) AS app by users search \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| rex max_match=100 field=search "tag=(?<tags>[^\s+\||\)]+)" \ +| rex max_match=100 field=search "eventtype=(?<eventtypes>[^\s+\||\)]+)" \ +| rex max_match=100 field=search "(?<macros>\`[^\s]+\`)" \ +| eval has_pipe=if(match(search,"\|"),"true",null()) \ +| rex field=search "(?P<search>[^\|]+\|)" \ +| eval search = if(isnotnull(has_pipe),search . " ... (trimmed)",search), mostRecent=strftime(mostRecent,"%+") \ +| fields - has_pipe, indexin, indexregex \ +| eval avg_run_time=round(avg_run_time,1) \ +| sort - total_cache_miss + +[SearchHeadLevel - SmartStore cache misses - combined] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. This report is designed to find the number of cache misses by saved searches or dashboards, originally created by Nico Van Der Walt +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = `searchheadhosts` (index=_audit action=search search_id NOT typeahead NOT "search_id='rsa_*") ```Original version by Nico Van Der Walt, modified by Gareth Anderson```\ +invocations_command_search_index_bucketcache_miss>0 OR invocations_command_search_rawdata_bucketcache_miss>0 TERM(info=*) TERM(UI:Dashboard:*) OR TERM(UI:Search) \ +| eval total_days_searched=(search_lt-search_et)/86400 \ +| eval total_hours_searched=total_days_searched*24 \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| eval total_days_searched=round(total_days_searched,0) \ +| eval search_id=trim(search_id,"\'") \ +| eval search_id=coalesce(search_id,sid) \ +| eval origSid=search_id \ +| rex field=search_id "subsearch_(?<search_id>.*)_\d+\.\d+" \ +| eval api_et=if(api_et="N/A", search_et, api_et) \ +| eval total_hours_searched=if(api_et="N/A", "AllTime",total_hours_searched) \ +| eval total_days_searched=if(api_et="N/A", "AllTime",total_days_searched) \ +| eval provenance=if(provenance="N/A",NULL,provenance) \ +| eval provenance=if(provenance="UI:LocateData",NULL,provenance) \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| eval search=if(match(search,"^'"),mvindex(search,1),search) \ +| stats latest(_time) AS mostRecent, values(host) as host sum(duration_command_search_rawdata_bucketcache_miss) AS duration__raw_cache_miss sum(invocations_command_search_index_bucketcache_miss) as count_index_cache_miss sum(invocations_command_search_rawdata_bucketcache_miss) as count_rawdata_cache_miss values(total_hours_searched) AS total_hours_searched values(total_days_searched) AS total_days_searched values(user) AS users last(search) AS search values(savedsearch_name) AS savedsearch_name max(total_run_time) AS run_time values(result_count) AS result_count values(event_count) AS event_count values(searched_buckets) AS searched_buckets values(info) AS info values(provenance) AS provenance dc(origSid) AS numofsearchesinquery by search_id \ +| `search_type_from_sid(search_id)` \ +| `base64decode(base64appname)` \ +| eval app3="N/A", app=coalesce(app,app2,base64appname,app3) \ +| eval total_cache_miss=count_index_cache_miss+count_rawdata_cache_miss \ +| search total_cache_miss>0 \ +| eval total_hours_searched=round(total_hours_searched,1) \ +| stats latest(mostRecent) AS mostRecent, count as number_of_runs, values(host) as host values(total_hours_searched) AS total_hours_searched values(total_days_searched) AS total_days_searched max(run_time) AS max_run_time avg(run_time) AS avg_run_time sum(run_time) AS sum_run_time sum(total_cache_miss) as total_cache_miss max(result_count) AS result_count max(event_count) AS event_count max(searched_buckets) AS searched_buckets values(info) AS info values(numofsearchesinquery) AS numofsearchesinquery, values(provenance) AS provenance, values(app) AS app by users search \ +| rex field=search "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=search "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| eval has_pipe=if(match(search,"\|"),"true",null())\ +| rex max_match=100 field=search "tag=(?<tags>[^\s+\||\)]+)" \ +| rex max_match=100 field=search "eventtype=(?<eventtypes>[^\s+\||\)]+)" \ +| rex max_match=100 field=search "(?<macros>\`[^\s]+\`)" \ +| rex field=search "(?P<search>[^\|]+\|)" \ +| eval search = if(isnotnull(has_pipe),search . " ... (trimmed)",search)\ +| fields - has_pipe, indexin, indexregex \ +| eval avg_run_time=round(avg_run_time,1) \ +| eval provenance=replace(provenance, "(?i)UI:Dashboard:", ""), mostRecent=strftime(mostRecent,"%+") \ +| rename provenance as dashboard\ +| sort - total_cache_miss\ +| table total_cache_miss, total_hours_searched, total_days_searched, mostRecent, users, number_of_runs, max_run_time, avg_run_time, sum_run_time, indexes, result_count, event_count, searched_buckets, info, numofsearchesinquery, dashboard, app, eventtypes, macros, tags, search + +[IndexerLevel - SmartStore cache misses - remote_searches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. This report is designed to find the number of cache misses at the indexing tier, based on a search from Richard Morgan's dashboard, https://github.com/silkyrich/cluster_health_tools/blob/master/default/data/ui/views/debug_cache_manager_misses.xml +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` sourcetype=splunkd_remote_searches StreamedSearch Streamed search connection terminated: OR closed: search_id=* `indexerhosts` rawdata_bucketcache_miss>0 OR index_bucketcache_miss>0 ```based on a search from Richard Morgan's dashboard, https://github.com/silkyrich/cluster_health_tools/blob/master/default/data/ui/views/debug_cache_manager_misses.xml``` \ +| rex field=_raw "search_rawdata_bucketcache_error=(?<rawdata_bucketcache_error>[\d.]+)" \ +| rex field=_raw "search_rawdata_bucketcache_miss=(?<rawdata_bucketcache_miss>[\d.]+)" \ +| rex field=_raw "search_index_bucketcache_error=(?<index_bucketcache_error>[\d.]+)" \ +| rex field=_raw "search_index_bucketcache_hit=(?<index_bucketcache_hit>[\d.]+)" \ +| rex field=_raw "search_index_bucketcache_miss=(?<index_bucketcache_miss>[\d.]+)" \ +| rex field=_raw "search_rawdata_bucketcache_hit=(?<rawdata_bucketcache_hit>[\d.]+)" \ +| rex field=_raw "search_rawdata_bucketcache_miss_wait=(?<rawdata_bucketcache_miss_wait>[\d.]+)" \ +| rex field=_raw "search_index_bucketcache_miss_wait=(?<index_bucketcache_miss_wait>[\d.]+)" \ +| rex field=_raw "drop_count=(?<drop_count>[\d.]+)" \ +| rex field=_raw "scan_count=(?<scan_count>[\d.]+)" \ +| rex field=_raw "eliminated_buckets=(?<eliminated_buckets>[\d.]+)" \ +| rex field=_raw "considered_events=(?<considered_events>[\d.]+)" \ +| rex field=_raw "decompressed_slices=(?<decompressed_slices>[\d.]+)" \ +| rex field=_raw "events_count=(?<events_count>[\d.]+)" \ +| rex field=_raw "total_slices=(?<total_slices>[\d.]+)" \ +| rex field=_raw "considered_buckets=(?<considered_buckets>[\d.]+)" \ +| stats \ + sum(rawdata_bucketcache_error) as search_rawdata_bucketcache_error_sum\ + sum(rawdata_bucketcache_miss) as search_rawdata_bucketcache_miss_sum\ + sum(index_bucketcache_error) as search_index_bucketcache_error_sum\ + sum(index_bucketcache_hit) as search_index_bucketcache_hit_sum\ + sum(index_bucketcache_miss) as search_index_bucketcache_miss_sum\ + sum(rawdata_bucketcache_hit) as search_rawdata_bucketcache_hit_sum\ + sum(rawdata_bucketcache_miss_wait) as search_rawdata_bucketcache_miss_wait_sum.\ + sum(index_bucketcache_miss_wait) as search_index_bucketcache_miss_wait_sum.\ + min(_time) as time_min \ + max(_time) as time_max\ + sum(drop_count) as drop_count_sum\ + sum(scan_count) as scan_count_sum14746\ + sum(eliminated_buckets) as eliminated_buckets_sum\ + sum(considered_events) as considered_events_sum\ + sum(decompressed_slices) as decompressed_slices_sum\ + sum(events_count) as events_count_sum14746\ + sum(total_slices) as total_slices_sum\ + sum(considered_buckets) as considered_buckets_sum, values(search) AS search\ + by search_id server \ +| search search_id=remote_* \ +| eval cache_misses=search_rawdata_bucketcache_miss_sum + search_index_bucketcache_miss_sum \ +| sort - search_rawdata_bucketcache_miss_sum \ +| table search_id cache_misses * \ +| sort 0 - cache_misses + +[ForwarderLevel - Stopping all listening ports] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 42 * * * * +description = Chance the alert requires action? Moderate. If the TCP listener ports stop temporarily it might be an issue with the downstream indexers (on a forwarder) if they stop for a long period or often enough than this is likely to cause issues with upstream forwarders +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `heavyforwarderhosts` OR `indexerhosts` sourcetype=splunkd `splunkadmins_splunkd_source` TERM(WARN) TERM(Stopping) \ +| bin _time span=1m \ +| stats count by host, _time \ +| sort - _time +disabled = 1 + +[IndexerLevel - Buckets in cache] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 33 */4 * * * +description = Report only? Yes. This report is an example search, note that since cacheman is a /admin/ endpoint it is undocumented and this may not work in all versions of Splunk (tested on 8.2.2.1) +dispatch.earliest_time = -4h +dispatch.latest_time = now +display.events.fields = ["index","sourcetype","host","source"] +display.page.search.tab = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkindexerhostsvalue` /services/admin/cacheman f=cm:bucket* count=0\ +| rex field=title "bid\|(?P<index>[^~]+)" \ +| stats min(cm:bucket.earliest_time) AS mintime by index\ +| eval days=round((now()-mintime)/60/60/24) + +[SearchHeadLevel - Excessive REST API usage] +alert.severity = 4 +alert.suppress = 0 +alert.track = 1 +counttype = number of events +cron_schedule = 42 * * * * +description = Chance the alert requires action? High. Excessive usage of the REST API by, for example, querying the jobs endpoint continuously without sleeping can result in the Splunk search head crashing due to excessive thread usage +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.chartHeight = 628 +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```attempt to detect overuse of the REST API by non-system users``` index=_internal `searchheadhosts` sourcetype=splunkd_access useragent!="Splunk/*" useragent!="Splunkd/*" user!=splunk-system-user user!=admin user!=- NOT "/results_preview" "/search/jobs/" clientip!="127.0.0.1" ```this is the splunk internal httplib version proxying requests on behalf of clients this will likely change on upgrade, current as of 8.2.2.1``` NOT (`splunkadmins_excessive_rest_api_httplib` "isProxyRequest=true")\ +| regex uri!="/control$" \ +| bin _time span=2m \ +| stats count by user, _time \ +| where count>`splunkadmins_excessive_rest_api_threshold` \ +| eval earliest=_time-120, latest=_time+120 \ +| eval query="index=_internal `searchheadhosts` sourcetype=splunkd_access useragent!=\"Splunk/*\" useragent!=\"Splunkd/*\" user!=splunk-system-user user!=admin user!=- NOT \"/results_preview\" \"/search/jobs/\" clientip!=\"127.0.0.1\" NOT (\"Python-httplib2/0.13.1 (gzip)\" \"isProxyRequest=true\") user=" . user . " earliest=" . earliest . " latest=" . latest . " | regex uri!=\"/control$\" | rex field=uri \"/(?P<last_of_url>[^/]+$)\" | streamstats current=false last(_time) AS prev_time by last_of_url | eventstats count AS count_by_last_of_url by last_of_url | eval time_diff=if(isnull(prev_time),null(),prev_time-_time)" \ +| fields - earliest, latest\ +| sort - count +disabled = 1 + +[SearchHeadLevel - Splunk Scheduler logs have not appeared in the last] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2,6,10,14,18,22 * * * +description = Chance the alert requires action? Moderate. If the scheduler logs have stopped on a search head then there is likely an issue +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```this works except in 8.2.5 it's slower than stats... | tstats count where index=_internal `searchheadhosts` source=*scheduler.log sourcetype=scheduler by host, _time span=1m``` \ +index=_internal `searchheadhosts` source=*scheduler.log sourcetype=scheduler \ +| bin _time span=1m \ +| stats count by host, _time \ +| timechart limit=0 span=5m aligntime=latest sum(count) AS count by host \ +| fillnull \ +| untable _time, host, count \ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, last(count) AS lastCount by host \ +| search ```Exclude time periods where shutdowns were occurring``` NOT [`splunkadmins_shutdown_time(searchheadhosts,60,60)`] \ +| where lastCount=0 \ +| eval logMessages="Zero log entries found at this time, this might be a Splunkd issue, please investigate" \ +| fields - lastCount \ +| eval mostRecent = strftime(mostRecent, "%+"), firstSeen=strftime(firstSeen, "%+") \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| table host, firstSeen, mostRecent, logMessages, search_head_cluster +disabled = 1 + +[MonitoringConsole - Core dumps have appeared on the filesystem] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2,6,10,14,18,22 * * * +description = Chance the alert requires action? High. core dumps are normally an issue that should be investigated with Splunk support if they are not a known issue, or dmesg or the journal shows no OS level issues at the time...Note that this can run from any search head, but the monitoring console may make more sense as it has connectivity to all instances +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/admin/file-explorer/%2Fopt%2Fsplunk count=0 f=name f=lastModifiedTime splunk_server=* \ +| search name=*core* \ +| eval recent=now()-(7200) \ +| where lastModifiedTime>recent \ +| sort - lastModifiedTime \ +| eval lastModifiedTime=strftime(lastModifiedTime, "%+") \ +| eval indexer_cluster=`indexer_cluster_name(splunk_server)` \ +| eval search_head=splunk_server \ +| eval search_head_cluster=`search_head_cluster` \ +| eval env=if(indexer_cluster==splunk_server,search_head_cluster,indexer_cluster) \ +| table env, splunk_server, name, title, lastModifiedTime +disabled = 1 + +[MonitoringConsole - Crash logs have appeared on the filesystem] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2,6,10,14,18,22 * * * +description = Chance the alert requires action? High. Crash logs are normally an issue that should be investigated with Splunk support if they are not a known issue, or dmesg or the journal shows no OS level issues at the time...Note that this can run from any search head, but the monitoring console may make more sense as it has connectivity to all instances +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal source=*crash.log \ +| stats count by source, host, sourcetype \ +| eval indexer_cluster=`indexer_cluster_name(host)` \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| eval env=if(indexer_cluster==host,search_head_cluster,indexer_cluster) \ +| table source, host, sourcetype, env +disabled = 1 + +[AllSplunkEnterpriseLevel - error in stdout.log] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 2,6,10,14,18,22 * * * +description = Chance the alert requires action? Moderate. Key errors just advise of invalid syntax/configuration on startup so these can be a useful warning of items requiring attention +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal source="*/splunkd_stdout.log" key `splunkenterprisehosts` \ +| cluster t=0.9 \ +| table _raw +disabled = 1 + +[SearchHeadLevel - Knowledge bundle status on indexers] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. This report advises of the current bundle version/status on the indexing tier from a search head +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /services/search/distributed/peers ```Note that the replicationStatus is only valid when run from the SHC captain node``` \ +| rex field=bundle_isIndexing "\w\s-\s(?<isindexing>\w+)" \ +| eval latest_bundle=mvindex(bundle_versions,0), isindexing=mvindex(isindexing,0) \ +| table splunk_server host status version guid replicationStatus latest_bundle isindexing + +[DeploymentServer - Count by application] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. Contributed by @trex (radler) on community slack. This report can be run to determine which applications are used by deployment clients. You can run this on the DeploymentServer or the monitoring console server? Yes +dispatch.earliest_time = -30d@d +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /services/deployment/server/clients `splunkadmins_deploymentserver_splunkserver` \ +| table hostname applications.*.stateOnClient \ +| untable hostname applications, value \ +| rex max_match=0 mode=sed field=applications "s/(application\.|\.stateOnClient)//g" \ +| stats count by applications \ +| appendcols [ | rest /services/deployment/server/applications `splunkadmins_deploymentserver_splunkserver` \ +| fields title \ +| rename titel as applications \ +| eval count=0 ] \ +| stats sum(count) as count by applications + +[SearchHeadLevel - Knowledge bundle replication times metrics.log] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. This report advises the bundle replication times recorded in metrics.log +dispatch.earliest_time = -12h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searchheadhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=bundles_uploads) TERM(status=success) \ +| `splunkadmins_sh_knowledgebundle_metrics_filter` \ +| eval rep_time=replication_time_msec/1000/60 \ +| bin _time span=`splunkadmins_sh_knowledgebundle_metrics_timespan` \ +| stats max(rep_time) AS slowest_rep_time, avg(rep_time) AS avg_rep_time by host, peer_name, _time, bundle_type \ +| eval slowest_rep_time=round(slowest_rep_time,2), avg_rep_time=round(avg_rep_time, 2) \ +| sort - slowest_rep_time + +[IndexerLevel - DataModel Acceleration - Indexes in use] +action.keyindicator.invert = 0 +alert.track = 0 +description = Report only? Yes. This report advises the indexes in use per datamodel (at least for models with acceleration) +dispatch.earliest_time = -12h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = line +display.visualizations.show = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest "/services/admin/introspection--disk-objects--summaries?count=-1" `splunkindexerhostsvalue` \ + ```If you want just a single datamodel you can do | tstats summariesonly=t count from datamodel=<datamodel> by index, remove the summariesonly=t to search non-accelerated. Finally | datamodel <datamodel> acceleration_search will show the accelerated search``` \ +| stats sum(total_size) AS total_size, sum(total_bucket_count) AS total_bucket_count, values(related_indexes) AS related_indexes by title, search_head_guid, type + +[SearchHeadLevel - Detect bundle pushes no longer occurring] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 13 * * * * +description = Chance the alert requires action? High. Once correctly tuned this alert detects if the bundle has stopped getting pushed by the SHC which can result in outdated knowledge objects and lookups on the indexing tier. Note that should the bundle exceeed maxBundleSize it can auto-delete the bundle after candidate creation, this is logged as DEBUG in 8.2.x and WARN or ERROR in 9.x. Finally, cascading bundle replication pre-9.0 without setting cascade_plan_replication_retry_fast=true in distsearch.conf can cause this +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `searcheadhosts` sourcetype=splunkd `splunkadmins_metrics_source` TERM(group=bundles_uploads) TERM(status=success) \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| timechart aligntime=latest span=`splunkadmins_bundlepush_span` dc(cycle_id) AS count by search_head_cluster \ +| filldown \ +| untable _time, search_head_cluster, count \ +| stats max(_time) AS mostRecent, min(_time) AS firstSeen, last(count) AS lastCount, sum(count) AS total_count by search_head_cluster \ +| eval mostRecent=strftime(mostRecent,"%+"), firstSeen=strftime(firstSeen,"%+") \ +| where lastCount=0 \ +| eval logMessages="No bundles were pushed in the last `splunkadmins_bundlepush_span` minutes, is something broken?! Or does the alert need tweaking? Action required...Note that should the bundle exceeed maxBundleSize it can auto-delete the bundle after candidate creation. Finally, cascading bundle replication pre-9.0 without setting cascade_plan_replication_retry_fast=true in distsearch.conf can cause this" \ +| table search_head_cluster, firstSeen, mostRecent, logMessages, total_count +disabled = 1 + +[MonitoringConsole - Check OS ulimits via REST] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = 13 * * * * +description = Chance the alert requires action? Moderate. This is based on a professional services example for checking ulimits. This alert just advises if you are below the recommended minimum specs or cannot create core dumps (note that you may need to set /etc/sysctl.conf kernel.core_pattern=/opt/splunk/%e-%s.core or similar to allow Splunk core dumps +dispatch.earliest_time = -1h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest splunk_server_group=* /services/server/sysinfo f=ulimits* f=transparent_hugepages* f=numberOfVirtualCores f=physicalMemoryMB f=ulimits.core_file_size\ +| fields splunk_server ulimits.data_segment_size ulimits.open_files ulimits.user_processes transparent_hugepages.* numberOfVirtualCores physicalMemoryMB ulimits.core_file_size\ +| eval cpu_core_count = if(isnotnull(numberOfVirtualCores), numberOfVirtualCores, numberOfCores)\ +| eval physical_memory_GB = round(physicalMemoryMB / 1024)\ +| eval cpu_sev = case(cpu_core_count <= 4 OR physical_memory_GB <= 4, 2, cpu_core_count < 12 OR\ +physical_memory_GB < 12, 1, cpu_core_count >= 12 AND physical_memory_GB >= 12, 0, true(), -1)\ +| eval cpu_core_count = cpu_core_count . " / 12 (current/recommended)"\ +| eval physical_memory_GB = physical_memory_GB . " / 12 (current/recommended)"\ +| eval core_sev = case('ulimits.core_file_size' == -1,0,true(),2)\ +| eval transparent_hugepages.enabled = case(len('transparent_hugepages.enabled') > 0,'transparent_hugepages.enabled', 'transparent_hugepages.effective_state' == "ok" AND\ + (isnull('transparent_hugepages.enabled') OR len('transparent_hugepages.enabled') = 0), "feature not available",'transparent_hugepages.effective_state' == "unknown" AND isnull('transparent_hugepages.enabled'), "unknown",\ + true(), "unknown")\ +| eval transparent_hugepages.defrag = case(len('transparent_hugepages.defrag') > 0, 'transparent_hugepages.defrag',\ + 'transparent_hugepages.effective_state' == "ok" AND (isnull('transparent_hugepages.defrag') OR\ + len('transparent_hugepages.defrag') = 0), "feature not available", 'transparent_hugepages.effective_state' ==\ + "unknown" AND isnull('transparent_hugepages.defrag'), "unknown", true(), "unknown") \ +| eval transparent_sev = case('transparent_hugepages.effective_state' == "unavailable", -1,\ + 'transparent_hugepages.effective_state' == "ok", 0, 'transparent_hugepages.effective_state' == "unknown", 1,\ + 'transparent_hugepages.effective_state' == "bad", 2)\ +| eval ulimits.data_segment_size = if(isnotnull('ulimits.data_segment_size'), 'ulimits.data_segment_size',"unavailable") \ +| eval ulimits.open_files = if(isnotnull('ulimits.open_files'), 'ulimits.open_files', "unavailable") \ +| eval ulimits.user_processes = if(isnotnull('ulimits.user_processes'), 'ulimits.user_processes', "unavailable") \ +| eval sev_segment_size = case('ulimits.data_segment_size' == -1 OR 'ulimits.data_segment_size' >= 1073741824, 0, 'ulimits.data_segment_size' == "unavailable", -1, true(), 2)\ +| eval sev_open_files = case('ulimits.open_files' == -1 OR 'ulimits.open_files' >= 64000, 0, 'ulimits.open_files' == "unavailable", -1, true(), 2)\ +| eval sev_user_processes = case('ulimits.user_processes' == -1 OR 'ulimits.user_processes' >= 16000, 0,'ulimits.user_processes' == "unavailable", -1, true(), 2) \ +| eval max_severity_level = max(cpu_sev, transparent_sev,sev_segment_size, sev_open_files, sev_user_processes, core_sev) \ +| fields splunk_server cpu_core_count, physical_memory_GB, ulimits.data_segment_size ulimits.open_files ulimits.user_processes transparent_hugepages.enabled transparent_hugepages.defrag transparent_hugepages.effective_state *sev* ulimits.core_file_size\ +| rename splunk_server AS instance \ +| eval 'ulimits.data_segment_size' = (if('ulimits.data_segment_size' >= 0, 'ulimits.data_segment_size', 'ulimits.data_segment_size'))." / 1073741824 (current / recommended)" \ +| eval 'ulimits.open_files' = (if('ulimits.open_files' >= 0,'ulimits.open_files', 'ulimits.open_files'))." / 64000 (current / recommended)" \ +| eval 'ulimits.user_processes' = (if('ulimits.user_processes'>= 0, 'ulimits.user_processes', 'ulimits.user_processes'))." / 16000 (current / recommended)" \ +| eval ulimits.core_file_size = 'ulimits.core_file_size' . " (current / -1 is unlimited) "\ +| fields - _timediff\ +| search max_severity_level!=0 +disabled = 1 + +[IndexerLevel - replicationdatareceiverthread close to 100% utilisation] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 3 +counttype = number of events +cron_schedule = */20 * * * * +description = Chance the alert requires action? Moderate. This has in some environments correlated to degraded ingestion performance across the indexing cluster +dispatch.earliest_time = -20m +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal sourcetype=splunkd `splunkadmins_metrics_source` group=dutycycle `indexerhosts` thread=replicationdatareceiverthread ```if the replication data receiver thread is using close to 100% for longer than a short period of time it results in other indexers slowing down, and often the entire ingestion tier slowing down``` \ +| eval dutycycle_ratio_perc=(ratio * 100) \ +| bin _time span=5m \ +| stats avg(dutycycle_ratio_perc) AS avg_dutycycle_ratio_perc by host, _time \ +| where avg_dutycycle_ratio_perc>90 \ +| sort 0 - _time \ +| streamstats time_window=6m count by host \ +| eventstats count(eval(count>1)) AS continuous_count by host\ +| where continuous_count>3\ +| fields - count\ +| eval indexer_cluster=`indexer_cluster_name(host)` +disabled = 1 + +[SearchHeadLevel - Accelerated DataModels Access Info] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As found on Clara-Fication: Finding and Improving Expensive Searches, https://conf.splunk.com/files/2022/slides/PLA1162B.pdf / https://conf.splunk.com/files/2022/recordings/PLA1162B_1080.mp4. Run on the search head with the DMA +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /services/admin/summarization by_tstats=1 \ +| eval summary.access_time = strftime('summary.access_time', "%F %T") \ +| table title summary.access_count summary.access_time summary.size + +[SearchHeadLevel - Dashboards resulting in concurrency issues] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As found on Clara-Fication: Finding and Improving Expensive Searches, https://conf.splunk.com/files/2022/slides/PLA1162B.pdf / https://conf.splunk.com/files/2022/recordings/PLA1162B_1080.mp4. Finds dashboards that result in the concurrency warnings in splunkd or scheduler log files +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal (sourcetype=splunkd `splunkadmins_splunkd_source`) OR sourcetype=scheduler "The maximum number of concurrent" `searchheadhosts` \ +| rex field=id "(?<ssuser>[^_]+)__" \ +| eval user = coalesce(user, username, ssuser), search_id = coalesce(savedsearch_id, id, "null") \ +| stats count as total_occurrences values(reason) as reason values(search_type) as search_type values(provenance) as provenance by host user search_id \ +| where isnotnull(provenance) \ +| stats values(search_id) as search_ids sum(total_occurrences) as total_occurrences values(reason) as reason by host user provenance + +[SearchHeadLevel - Dashboards that may benefit from base or post-process searches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As found on Clara-Fication: Finding and Improving Expensive Searches, https://conf.splunk.com/files/2022/slides/PLA1162B.pdf / https://conf.splunk.com/files/2022/recordings/PLA1162B_1080.mp4. Finds dashboards that may benefit if a post-process or base search was used within the dashboard +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/data/ui/views f=eai:data f=title f=eai:appName timeout=900 \ +| fields title eai:appName eai:data splunk_server author \ +| search eai:data="*<search>*" \ +| xpath outfield=base_id "//search/@id" field=eai:data \ +| xpath outfield=query "//query" field=eai:data \ +| rex field=query "\|?(?<generating_spl>[^\|]+)(\||.*)" \ +| eval total_query=mvcount(generating_spl) \ +| eval dc_query=mvdedup(generating_spl) \ +| eval distinct_query=mvcount(dc_query) \ +| stats values(total_query) as total_query values(distinct_query) as query_count values(base_id) as base_id list(generating_spl) as generating_spl by title eai:appName author splunk_server \ +| where total_query!=query_count + +[SearchHeadLevel - Searches by search type] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As found on Clara-Fication: Finding and Improving Expensive Searches, https://conf.splunk.com/files/2022/slides/PLA1162B.pdf / https://conf.splunk.com/files/2022/recordings/PLA1162B_1080.mp4. Count searches by search type +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit `searchheadhosts` action=search info=completed \ +| rex field=provenance "^(?<provenance_group>[^:]+(:[^:]+)?)" \ +| stats dc(app) as apps dc(user) as users count as searches sum(total_run_time) as seconds by provenance_group \ +| addinfo \ +| eval concurrency_factor = round(seconds / (info_max_time - info_min_time), 2) \ +| fields - info_* \ +| sort - seconds + +[IndexerLevel - Buckets have being frozen due to index sizing SmartStore] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 33 3,7,11,15,19,23 * * * +description = Chance the alert requires action? High. One or more indexes have hit the index size limit and are now been frozen due to this. Note this won't work for non-SmartStore based indexers, refer to the alert IndexerLevel - Buckets have being frozen due to index sizing +dispatch.earliest_time = -5h +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```The indexer is freezing buckets due to disk space pressure before the frozenTimePeriodInSecs limit has been reached, this could be a problem if it is not expected...```\ +index=_internal `cluster_masters` sourcetype=splunkd (`splunkadmins_splunkd_source`) freezing reason NOT "exceeded frozenTimePeriodInSecs" \ +`splunkadmins_bucketfrozen`\ +| eval newestDataInBucket=strftime(bucket_latest , "%+"), oldestDataInBucket = strftime(bucket_earliest, "%+") \ +| eval message=coalesce(message,event_message)\ +| table message, oldestDataInBucket, newestDataInBucket +disabled = 1 + +[IndexerLevel - Connection errors to SmartStore] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +alert_condition = where statusCode>500 AND count>5 +counttype = number of events +cron_schedule = 4 * * * * +description = Chance the alert requires action? High. If you are seeing errors from SmartStore it's often a sign of an issue, this is more useful outside the AWS environment where issues are less common. Note defaults to fire on >5 events with 5xx status codes or higher. +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This alert is created to check for any 500 or 503 error from the Smart Store/S3```\ + index=_internal sourcetype=splunkd `indexerhosts` `splunkadmins_splunkd_source` ( log_level=ERROR OR log_level=WARN ) S3Client \ +| cluster labelonly=true t=0.5\ +| stats earliest(_time) as FirstSeen, latest(_time) as LastSeen count, first(_raw) AS _raw by statusCode, cluster_label\ +| convert ctime(FirstSeen) ctime(LastSeen)\ +| fields - cluster_label +disabled = 1 + +[SearchHeadLevel - Sourcetypes usage from search telemetry data] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. Use introspection data to find sourcetypes in use (note there is a lot of useful information in here beyond sourcetypes). Note that this appears to capture all searches excluding sub-searches and excluding cancelled searches. \ +I believe it relates to the search summarize tstats=t maintain=""... search which is then ingested via /opt/splunk/var/run/splunk/search_telemetry \ +Note that in my testing if a scheduled search exceeds the limits.conf [search] max_count= setting, then information such as phase0 and sourcetypes do not get recorded for the savedsearch. This happened in 9.0.3/9.1.2 and did not effect ad-hoc searches at all, limit was 500K. +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_introspection sourcetype=search_telemetry `searchheadhosts` \ +| eval first_search_command = mvindex('search_commands{}.name', 0) \ +| eval all_sourcetypes="" \ +| foreach desc.sourcetypes.* [| eval all_sourcetypes=coalesce("<<MATCHSTR>>".",","").all_sourcetypes] \ +| makemv all_sourcetypes delim="," \ +| table search_id, status, timestamp, type, desc.app, desc.earliest_time, desc.latest_time, desc.batch_mode_search, desc.provenance, all_sourcetypes + +[syslog-ng - cache statistics summary] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This uses the syslog-ng internal log files which is the d_local_internal destination of syslog-ng. Assuming you are reading this into Splunk the statistics can be useful to help determine where an issue might be.\ +Marc Andersen, company: NIL815 ApS provided this example on community slack +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `sysloghosts` statistics\ +| rex mode=sed "s/',/';/g"\ +| rex mode=sed "s/([^ =]*(?=='[^']*'))(=)'([^']*)'/\1_\3/g"\ +| extract pairdelim=";" kvdelim="="\ +| sort 0 + host, _time\ +| streamstats current=true global=true window=2 first(processed_*) AS prev_processed_* by host\ +| foreach processed_*\ +[ eval <<FIELD>>diff='<<FIELD>>'-'prev_<<FIELD>>' ]\ +| fields - processed_center_received_diff\ +| timechart avg(*_diff) AS "*_diff" span=10m by host\ +```At this point you probably want to do a | fields *d_syslog_output* or similar``` + +[SearchHeadLevel - Knowledge Bundle contents] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report relies on an external app, Admins Little Helper for Splunk, https://splunkbase.splunk.com/app/6368\ +This tool helps you see inside the contents of a search head's knowledge bundle. This is very useful with the computed option when you are seeing warning messages in splunkd.log such as\ +WARN DistributedBundleReplicationManager [92649 BundleReplicatorThread] - Discard the candidate bundle as its size exceeds maxBundleSize= and\ +WARN DistributedBundleReplicationManager [92649 BundleReplicatorThread] - Bundle Replication is blocked, distributed searches continue to run against preserved bundle /opt/splunk/var/run/<file>.bundle +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = ```This is an example of what the search looks like, remove the comment for this to work. Note that you can remove the bundle=computed so that you see the current bundle, computed is more useful if you cannot distribute the bundle due to size. | bundlefiles bundle=computed``` "" \ +| sort 0 - bytes \ +| eval MB=round(bytes/1024/1024) \ +| eventstats sum(bytes) AS total_kv_mb by kvstore_collection, kvstore_app \ +| eval total_kv_mb=round(total_kv_mb/1024/1024) \ +| table path, app, bytes, MB, source, bundle_epoch, kvstore_app, kvstore_collection, total_kv_mb + +[SearchHeadLevel - audit.log - lookup usage] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to determine which lookups are in active use by querying the audit.log files. Note that automatic lookups do not appear in audit.log files from my testing +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit `searchheadhosts` search=* lookup OR outputlookup OR inputlookup OR apply TERM(info=completed) NOT "search_type=\"typeahead\"" \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex field=search mode=sed "s/\n/ /g" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +| regex search="((input|output)?lookup)|(\|\s+apply\s+)" \ +``` remove any fields called lookup_file ```\ +| eval lookup_file=null()\ +| eval search_head = host \ +| eval search_head_cluster=`search_head_cluster` \ +| rex field=search mode=sed "s/\n/ /g"\ +| rex field=search mode=sed "s/```.*?```/ /g" \ +``` remove append= local= update= key_field= as that will confuse the regexes to extract the lookup file name ```\ +| rex field=search mode=sed "s/(append|local|update|key_field)\s*=\s*\S+\s+/ /g"\ + ``` deal with standard | inputlookup, | outputlookup or | lookup ``` \ +| rex max_match=20 field=search "(?ms)\|\s*(?P<operation>((input|output)?lookup|apply))\s+\"?(?P<lookup_file>[^\"\s'\|\]]+)" \ + ``` deal with | from:inputlookup: "lookupfile.csv" ``` \ +| rex max_match=20 field=search "(?ms)\|\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file2>[^\"\s'\|\]]+)"\ + ``` deal with subsearches with [ inputlookup ] or [ outputlookup ] ``` \ +| rex max_match=20 field=search "(?ms)\s*\[\s*(?P<operation>((input|output)?lookup)|apply)\s+\"?(?P<lookup_file3>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)\s*\[\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file4>[^\"\s'\|\]]+)"\ + ``` this one occurs in sub-searches for example search=' inputlookup filename.csv' could work in a pure-subsearch only, otherwise it's missing the | symbol at the start ``` \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)(?P<operation2>((input|output)?lookup|apply))\s+\"?(?P<lookup_file5>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)from\s+(?P<operation2>inputlookup):\s*\"?(?P<lookup_file6>[^\"\s\|\]]+)" \ + ``` these is a case here where someone could use multiple techniques in 1 search e.g. | from:inputlookup and | lookup and this search will miss them but that seems unlikely ``` \ +| eval lookup_file=coalesce(lookup_file,lookup_file2,lookup_file3,lookup_file4) \ +| eval lookup_file_subsearch=if(match(search_id, "^'subsearch_") AND (isnotnull(lookup_file5) OR isnotnull(lookup_file6)),coalesce(lookup_file5,lookup_file6),null()) \ +| eval lookup_file=coalesce(lookup_file,lookup_file_subsearch)\ +| eval operation=if(isnotnull(lookup_file_subsearch),coalesce(operation,operation2),operation)\ +``` unsure why this occurs but some apps use a app: syntax on the apply command which appears to result in __mlspl_ working as normal...they also do not use the extension .mlmodel ``` \ +| rex mode=sed "s/app://" field=lookup_file\ +| eval combined=mvzip(operation,lookup_file,"|")\ +| rex field=combined mode=sed "s/^apply\|/apply|__mlspl_/"\ +| rex field=combined mode=sed "s/^(apply\|.*)/\1.mlmodel/"\ +| eval lookup_file=mvmap(combined,mvindex(split(combined,"|"),1))\ +``` splunk sometimes extracts this correctly and sometimes auto-extraction fails ```\ +| rex ", savedsearch_name=\"(?P<savedsearch_name>[^\"]+)\","\ +| eval savedsearch_name=if(savedsearch_name=="",null(),savedsearch_name)\ +| stats max(_time) AS _time, count, values(savedsearch_name) AS savedsearch_name, values(provenance) AS provenance by lookup_file, user, app, search_head_cluster, operation \ +| eval savedsearch_name=mvjoin(savedsearch_name,",") + +[IndexerLevel - RemoteSearches - lookup usage] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to determine which lookups are in active use by querying the remote_searches.log file on indexers. Note that automatic lookups do not appear in remote searches log files from my testing +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` sourcetype=splunkd_remote_searches source=*remote_searches.log* lookup TERM(StreamedSearch) TERM(starting:) NOT TERM(terminated:)\ +| rex "(?s) active_searches=[^,]+, search='(?P<search>.*?)', remote_ttl" \ +| rex max_match=20 field=search "(?ms)\|\s*(?P<operation>lookup)\s+\"?(?P<lookup_file>[^\"\s'\|]+)"\ +| where isnotnull(lookup_file)\ +| rex "starting: search_id=(?P<search_id>[^,]+)"\ +| rex field=search_id "^remote_(?P<sid>.*)"\ +| rex "search_id=[^,]+,\s+server=(?P<server>[^,]+)"\ +| eval server_with_underscore = server. "_"\ +| eval sid=replace(sid, server_with_underscore, "")\ +| eval search_head = server\ +| eval search_head_cluster=`search_head_cluster` \ +| `search_type_from_sid(sid)` \ +| `base64decode(base64username)` \ +| eval username3="unknown" \ +| eval user=coalesce(username, base64username, username3)\ +| `base64decode(base64appname)`\ +| eval app3="N/A" \ +| eval app=coalesce(app,base64appname,app3)\ +| fillnull value="N/A" app, user\ +| rex ", apiEndTime='[^,]+,\s+savedsearch_name=\"(?P<savedsearch_name>[^\"]+)" \ +| lookup rmd5_to_savedsearchname RMDvalue AS searchname OUTPUTNEW savedsearch_name \ +| eval type=if(match(savedsearch_name, "^_ACCELERATE"),"acceleration",type) \ +| stats max(_time) AS _time, count, values(savedsearch_name) AS savedsearch_name, values(type) AS type by search_head_cluster, app, lookup_file, user, operation \ +| eval savedsearch_name=mvjoin(savedsearch_name,",") + +[SearchHeadLevel - summary indexing searches not using durable search] +action.keyindicator.invert = 0 +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 4 +counttype = number of events +cron_schedule = 4 8 * * * +description = Chance the alert requires action? Low. This alert simply advises that durable search should be used https://docs.splunk.com/Documentation/Splunk/latest/Report/Durablesearch to prevent any loss in summary indexing +dispatch.earliest_time = -60m@m +dispatch.latest_time = now +display.events.fields = ["host","source","sourcetype"] +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 0 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` timeout=600 /servicesNS/-/-/saved/searches f=search count=0 f=durable.track_time_type f="action.summary_index" search="disabled=0" search="is_scheduled=1" f=eai:acl* f=next_scheduled_time\ +| search `splunkadmins_summaryindex_durablesearch` \ +| rex field=search "(?ms)\|\s*(?P<match>(collect|mcollect))"\ +| where (action.summary_index=="1" OR isnotnull(match)) AND len('durable.track_time_type')<1\ +| rename eai:acl.app AS app, author as owner, eai:acl.sharing AS sharing\ +| table title, app, owner, sharing, updated + +[SearchHeadLevel - license usage per sourcetype per index] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to determine the usage per index per sourcetype, which is a useful way to indirectly answer the question of if an index and sourcetype combination have received data recently. Alternatives include tstats, metasearch or apps like TrackMe. This is just a very lightweight and fast way to track data, however it only reports as often as indexers send in license usage +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `licensemasterhost` source=*license_usage.log* sourcetype=splunkd\ +| stats sum(b) AS bytes, latest(_time) AS _time by st, idx, i\ +| join type=outer [ | rest /services/server/info f=guid | rename guid as i, splunk_server AS source_host | table i, source_host ]\ +| fillnull source_host value="Unknown"\ +| eval indexer_cluster_name=`indexer_cluster_name(source_host)`\ +| rename st as sourcetype, idx AS index\ +| stats latest(_time) AS _time, sum(bytes) AS bytes by sourcetype, index, indexer_cluster_name + +[SearchHeadLevel - Lookup file owners] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to determine the owner of every lookup file on the search head in question, useful for summary indexing or storing into a lookup +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/lookup-table-files `splunkadmins_restmacro` timeout=900 count=0 \ + ``` converting to epochtime fails if the last updated time is 1970/doesn't have an entry in metadata, so this will exclude most application default lookup files ``` \ +| eval lastupdated=strptime(updated, "%Y-%m-%dT%H:%M:%S%:z") \ +| where isnotnull(lastupdated) \ +| table "title" "eai:appName" "eai:acl.owner" "eai:acl.sharing" \ +| rename "title" as "lookup_file" "eai:appName" as "app" "eai:acl.owner" as "owner" "eai:acl.sharing" as sharing \ +| join type=outer lookup_file, app, sharing \ + [| rest /servicesNS/-/-/data/transforms/lookups `splunkadmins_restmacro` search=type=file f=filename f=eai:acl* timeout=900 count=0 \ + | table "eai:acl.owner" "eai:acl.app" filename "eai:acl.sharing" "lastupdated" title \ + | rename "filename" as "lookup_file" "eai:acl.app" as "app" "eai:acl.owner" as "owner_from_definition" "eai:acl.sharing" AS sharing title AS lookup_name\ + ``` unfortunately it's challenging to tell if a lookup came with an app or was uploaded later, however a lookup definition cannot be deleted if it's not from the local directory, so we can use this to eliminate\ + lookup files that should not be deleted ``` \ + | eval cannot_delete="true" ] \ +| join type=outer lookup_name app owner_from_definition \ + [ | makeresults ``` Remove the make results and uncomment once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146, installed otherwise this search may attempt to delete lookup files that relate to default/ config. | curl uri="https://localhost:8089/servicesNS/-/-/data/transforms/lookups?count=0&output_mode=json&search=type=file&f=eai:acl*" splunkauth=true ``` \ + | rex field=curl_message "\"remove\":\"(?P<lookup_location>[^\"]+)\".*?\"author\":\"(?P<owner_from_definition>[^\"]+)" max_match=0 \ + | fields lookup_location, owner_from_definition \ + | eval combined=mvzip(lookup_location, owner_from_definition) \ + | fields combined \ + | mvexpand combined \ + | eval combined=split(combined,",") \ + | eval lookup_location=mvindex(combined,0), owner_from_definition=mvindex(combined,1) \ + | table lookup_location, owner_from_definition \ + | rex field=lookup_location "/servicesNS/[^/]+/(?P<app>[^/]+)/data/transforms/lookups/(?P<lookup_name>[^/]+)" \ + | eval lookup_name=urldecode(lookup_name) \ + | eval can_delete="true" ]\ +| where (isnull(cannot_delete) AND isnotnull(can_delete)) OR (isnull(can_delete) AND isnull(cannot_delete)) \ +| fields - lookup_location, can_delete + +[SearchHeadLevel - REST API usage via audit.log] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report may not be 100% accurate but the goal is to attempt to list search data through audit.log run by REST API searches +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit `searchheadhosts` "provenance=\"N/A\"" OR provenance="rest" info=granted OR info=completed action=search \ +| regex search_id!="^'(sub|ta)" \ +| table search_id, total_run_time, search_et, search_lt, api_et, api_lt, scan_count, _time, user, info, host \ +| eval search_et=strftime(search_et, "%d/%m/%Y %H:%M"), search_lt=strftime(search_lt, "%d/%m/%Y %H:%M"), api_et=strftime(api_et, "%d/%m/%Y %H:%M"), api_lt=strftime(api_lt, "%d/%m/%Y %H:%M"), _time=strftime(_time, "%d/%m/%Y %H:%M") + +[SearchHeadLevel - Lookup Editor lookup updates] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report just determine who has recently used the looup editor to edit a lookup file +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal "Lookup edited successfully" sourcetype=lookup_editor_rest_handler `searchheadhosts` \ +| eval search_head=host \ +| eval search_head_cluster=`search_head_cluster` \ +| stats max(_time) AS _time, count by lookup_file, namespace, search_head_cluster, user \ +| rename namespace AS app + +[SearchHeadLevel - Detect lookups that have not being accessed for a period of time] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to determine which lookups have not being used during a period of time, potentially allowing them to be deleted. This requires the summary reports "IndexerLevel - RemoteSearches - lookup usage", "SearchHeadLevel - audit.log - lookup usage", "SearchHeadLevel - Lookup Editor lookup updates" \ +Additionally it relies on the lookup populated by "SearchHeadLevel - Lookup file owners" +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=summary source IN ("IndexerLevel - RemoteSearches - lookup usage", "SearchHeadLevel - audit.log - lookup usage", "SearchHeadLevel - Lookup Editor lookup updates") \ +| stats latest(_time) as lastUsedTime values(user) as user by lookup_file app search_head_cluster \ +| join app, lookup_file type=outer \ + [| rest /servicesNS/-/-/data/transforms/lookups `splunkadmins_restmacro` search=type=file f=filename f=eai:acl* timeout=900 count=0 \ + | rename eai:acl.app AS app, title AS lookup_file \ + | table app, lookup_file, filename, sharing ] \ +| eval lookup_file=coalesce(filename, lookup_file) \ +| append \ + [| inputlookup splunkadmins_lookupfile_owners \ + | rename "lastupdated" as "lastUsedTime"] \ +| fillnull search_head_cluster value=`search_head_cluster`\ +| stats values(*) AS * by lookup_file app search_head_cluster\ +``` at this point we have the latest use by lookup_file but it's possible a lookup file definition name was used, not the lookup file name. So we lookup the lookup_file from the search and if it's matching a lookup_name we output the file name ```\ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file, app OUTPUT lookup_file AS lookup_file2\ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file OUTPUT lookup_file AS lookup_file_global\ +``` if the lookup file is globally shared we cannot use an app level match, only the lookup file name. However only the lookup names from searches will mention the lookup_file_gobal field and the definition only has a lookup_file ```\ +| eval lookups_combined=coalesce(lookup_file_global,lookup_file)\ +| eventstats values(sharing) AS sharing_global by lookups_combined, search_head_cluster\ +| eval lookup_file=if(sharing_global=="global" AND isnotnull(lookup_file_global),lookup_file_global,lookup_file)\ +| eval lookup_file=if(isnotnull(lookup_file2),lookup_file2,lookup_file)\ +``` now we can find the latest by the filename even if a lookup definition was used ```\ +| eventstats max(lastUsedTime) AS lastUsedTime by lookup_file, app, search_head_cluster\ +| eventstats max(lastUsedTime) AS lastUsedTime_Global, values(user) AS user_global by lookup_file search_head_cluster\ +``` if the lookup is globally shared the last used time in any app is counted as the last used time of the lookup ```\ +| eval lastUsedTime=if(sharing=="global",lastUsedTime_Global,lastUsedTime)\ +| eval user=if(sharing=="global",user_global,user) \ +| eval lastUsedTime=strftime(lastUsedTime, "%+")\ +| stats values(*) AS * by lookup_file app search_head_cluster\ +``` if the owner is null the lookup doesn't exist (a potential false match by the summary searches) ```\ +| where isnotnull(owner)\ +| eval user=mvdedup(mvappend(mvappend(user,owner),owner_from_definition)) + +[SearchHeadLevel - Lookups within dashboards] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to find all lookups that exist within all dashboards, it relies on the lookup populated by "SearchHeadLevel - Lookup file owners" +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` timeout=900 /servicesNS/-/-/data/ui/views f=eai:data f=eai:acl* \ +| regex eai:data="((input|output)?lookup)|(\|\s+apply\s+)" \ +| spath input=eai:data \ +| foreach *search*.query [ eval combined = mvappend(combined,'<<FIELD>>') ] \ +| rename combined AS search \ +| fields sharing, title, app, search eai:acl.app eai:acl.sharing \ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing \ +| regex search="((input|output)?lookup)|(\|\s+apply\s+)" \ +| rex field=search mode=sed "s/\n/ /g" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +``` remove append= local= update= key_field= as that will confuse the regexes to extract the lookup file name ``` \ +| rex field=search mode=sed "s/(append|local|update|key_field)\s*=\s*\S+\s+/ /g" \ + ``` deal with standard | inputlookup, | outputlookup or | lookup ``` \ +| rex max_match=20 field=search "(?ms)\|\s*(?P<operation>((input|output)?lookup|apply))\s+\"?(?P<lookup_file>[^\"\s'\|\]]+)" \ + ``` deal with | from:inputlookup: "lookupfile.csv" ``` \ +| rex max_match=20 field=search "(?ms)\|\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file2>[^\"\s'\|\]]+)" \ + ``` deal with subsearches with [ inputlookup ] or [ outputlookup ] ``` \ +| rex max_match=20 field=search "(?ms)\s*\[\s*(?P<operation>((input|output)?lookup)|apply)\s+\"?(?P<lookup_file3>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)\s*\[\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file4>[^\"\s'\|\]]+)" \ + ``` this one occurs in sub-searches for example search=' inputlookup filename.csv' could work in a pure-subsearch only, otherwise it's missing the | symbol at the start ``` \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)(?P<operation2>((input|output)?lookup|apply))\s+\"?(?P<lookup_file5>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)from\s+(?P<operation2>inputlookup):\s*\"?(?P<lookup_file6>[^\"\s\|\]]+)" \ + ``` these is a case here where someone could use multiple techniques in 1 search e.g. | from:inputlookup and | lookup and this search will miss them but that seems unlikely ``` \ +| eval lookup_file=mvdedup(mvappend(lookup_file,lookup_file2,lookup_file3,lookup_file4,lookup_file5)) \ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file, app OUTPUT lookup_file AS lookup_file2 \ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file OUTPUT lookup_file AS lookup_file_global \ +``` if the lookup file is globally shared we cannot use an app level match, only the lookup file name. However only the lookup names from searches will mention the lookup_file_gobal field and the definition only has a lookup_file ``` \ +| eval lookups_combined=coalesce(lookup_file_global,lookup_file) \ +| eventstats values(sharing) AS sharing_global by lookups_combined \ +| eval lookup_file=if(sharing_global=="global" AND isnotnull(lookup_file_global),lookup_file_global,lookup_file) \ +| eval lookup_file=if(isnotnull(lookup_file2),lookup_file2,lookup_file) \ +``` this line checks if the lookup exists as such so we don't include kvstore lookups. Additionally we want the sharing of the lookup, not the dashboard ``` \ +| lookup splunkadmins_lookupfile_owners lookup_file OUTPUT lookup_file AS lookup_file, sharing \ +| where isnotnull(lookup_file) \ +| stats count by app, sharing, lookup_file + +[SearchHeadLevel - Lookups within savedsearches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to find all lookups that exist within all savedsearches, it relies on the lookup populated by "SearchHeadLevel - Lookup file owners" +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` timeout=900 /servicesNS/-/-/saved/searches f=search f=eai:acl* \ +| regex search="((input|output)?lookup)|(\|\s+apply\s+)" \ +| fields title, search eai:acl.app eai:acl.sharing \ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing \ +| rex field=search mode=sed "s/\n/ /g" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +``` remove append= local= update= key_field= as that will confuse the regexes to extract the lookup file name ``` \ +| rex field=search mode=sed "s/(append|local|update|key_field)\s*=\s*\S+\s+/ /g" \ + ``` deal with standard | inputlookup, | outputlookup or | lookup ``` \ +| rex max_match=20 field=search "(?ms)\|\s*(?P<operation>((input|output)?lookup|apply))\s+\"?(?P<lookup_file>[^\"\s'\|\]]+)" \ + ``` deal with | from:inputlookup: "lookupfile.csv" ``` \ +| rex max_match=20 field=search "(?ms)\|\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file2>[^\"\s'\|\]]+)" \ + ``` deal with subsearches with [ inputlookup ] or [ outputlookup ] ``` \ +| rex max_match=20 field=search "(?ms)\s*\[\s*(?P<operation>((input|output)?lookup)|apply)\s+\"?(?P<lookup_file3>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)\s*\[\s*from\s+(?P<operation>inputlookup):\s*\"?(?P<lookup_file4>[^\"\s'\|\]]+)" \ + ``` this one occurs in sub-searches for example search=' inputlookup filename.csv' could work in a pure-subsearch only, otherwise it's missing the | symbol at the start ``` \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)(?P<operation2>((input|output)?lookup|apply))\s+\"?(?P<lookup_file5>[^\"\s'\|\]]+)" \ +| rex max_match=20 field=search "(?ms)(^\s*|\s*\|\s*)from\s+(?P<operation2>inputlookup):\s*\"?(?P<lookup_file6>[^\"\s\|\]]+)" \ + ``` these is a case here where someone could use multiple techniques in 1 search e.g. | from:inputlookup and | lookup and this search will miss them but that seems unlikely ``` \ +| eval lookup_file=mvdedup(mvappend(lookup_file,lookup_file2,lookup_file3,lookup_file4,lookup_file5)) \ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file, app OUTPUT lookup_file AS lookup_file2 \ +| lookup splunkadmins_lookupfile_owners lookup_name AS lookup_file OUTPUT lookup_file AS lookup_file_global \ +``` if the lookup file is globally shared we cannot use an app level match, only the lookup file name. However only the lookup names from searches will mention the lookup_file_gobal field and the definition only has a lookup_file ``` \ +| eval lookups_combined=coalesce(lookup_file_global,lookup_file) \ +| eventstats values(sharing) AS sharing_global by lookups_combined \ +| eval lookup_file=if(sharing_global=="global" AND isnotnull(lookup_file_global),lookup_file_global,lookup_file) \ +| eval lookup_file=if(isnotnull(lookup_file2),lookup_file2,lookup_file) \ +``` this line checks if the lookup exists as such so we don't include kvstore lookups. Additionally we want the sharing of the lookup not the savedsearch ``` \ +| lookup splunkadmins_lookupfile_owners lookup_file OUTPUT lookup_file AS lookup_file, sharing \ +| where isnotnull(lookup_file) \ +| stats count by app, sharing, lookup_file + +[MonitoringConsole - one or more servers require configuration] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 * * * * +description = Chance the alert requires action? High. Note uses a lookup in the monitoring console app, so you will either need to share the lookup globally or move the alert into the required app. This alert is designed to find a situation where the monitoring console has new servers in the "New" state and not "Configured", until the Apply changes button is pressed in Settings -> General Setup the said servers will not be part of the searches (and therefore your search results will not see all indexers if you added new indexers into the cluster) +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest splunk_server=local "/services/search/distributed/peers?output_mode=json&search=disabled%3D0&search=status%3DUp" f=name f=cluster_label f=host f=status* f=search_groups f=server_roles f=peerType\ +| join name type=outer \ + [ | makeresults ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command and makeresults placeholder | curl splunkauth=true uri="https://localhost:8089/servicesNS/nobody/splunk_monitoring_console/configs/conf-splunk_monitoring_console_assets/settings?output_mode=json" ```\ + ``` Partially based on the examples on https://gist.github.com/nmattam/bcfbc8a4ebd9a520c2ac50ab0137e58f ``` \ +| spath input=curl_message path="entry{}[0].content.configuredPeers" output=configuredPeers\ + ``` add the new peers and assume that the assumed configuration is accurate, but first dedup in case we have added duplicates through this process ``` \ +| makemv configuredPeers delim="," \ + | table configuredPeers \ + | mvexpand configuredPeers \ + | eval fromjoin="true" \ + | rename configuredPeers AS name ]\ +| where isnull(fromjoin)\ +| table host, name, cluster_label, status, status_details, search_groups, server_roles, peerType +disabled = 1 + +[SearchHeadLevel - Peer timeouts or authentication issues] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 * * * * +description = Chance the alert requires action? Moderate. A connect timeout may be an issue with timeouts or a slow responding peer. An unable to get authentication token may require manual intervention +dispatch.earliest_time = -1h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal sourcetype=splunkd `splunkadmins_splunkd_source` `searchheadhosts` "Connect Timeout" OR "Unable to get authentication token" component IN (DistributedPeer, GetRemoteAuthToken) \ +| search ```Exclude time periods where shutdowns were occurring including 10 minutes after shutdown to handle any reboot time``` NOT \ + [ `splunkadmins_shutdown_time(indexerhosts,60,600)`] \ +| bin _time span=5m \ +| rex "(?P<peerinfo>peer:\s|Peer:.*)" \ +| rex "from peer:\s+https://(?P<peerinfo>.*)" \ +| rex "peeruri=\"(?P<peerinfo>[^\"]+)" \ +| rex "https://(?P<clientip>[^:]+)" \ +| lookup dnslookup clientip \ +| stats count, last(_raw) AS raw, values(peerinfo) AS peerinfo, values(clienthost) AS clienthost by host, _time \ +| sort - _time \ +| eval _time=strftime(_time,"%+") +disabled = 1 + +[SearchHeadLevel - Datamodel REST endpoint indexes in use] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to find indexes in use by a datamodel via REST endpoint (| tstats count from datamodel=... groupby index ) can work as well. Note that for accurate macro substituition the splunkadmins_macros will need to be up-to-date +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` timeout=900 /servicesNS/-/-/data/models f=eai:data f=eai:acl* \ +| eval splunk_server = `splunkadmins_splunk_server_name` \ +| `splunkadmins_macro_sub('eai:data')` \ +| regex eai:data="index\s*(=|[iI][nN])" \ +| rex field=eai:data "(?P<esstylewildcard>\(\s*index=\*\s+OR\s+index=_\*\s*\))" \ +| rex field=eai:data "(?sm)(NOT\s+index\s*(=|::)\s*[^ ]+)|(NOT\s+\([^\)]+\))|(index\s*(=|::)\s*(\\\)?\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=eai:data "(?sm)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| table title, indexes, eai:data, eai:acl.app + +[SearchHeadLevel - Job performance data per indexer] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to report the job performance per indexer of any particular SID. You will need to change this to the correct sid +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /services/search/jobs/1702011904.5983330_C87A3145-95EE-4558-B4F7-FE19CB3D7557 \ +| fields performance.dispatch.stream.remote.*.duration_secs performance.dispatch.stream.remote.*.output_count performance.dispatch.stream.remote.*.invocations \ +| untable perf field value \ +| rex field=field "performance\.dispatch\.stream\.remote\.(?<indexer>.*)\.(?P<type>.*)" \ +| eval combined=type . "!" . value \ +| stats values(combined) AS combined by indexer \ +| eval combined_count=mvcount(combined) \ +| where combined_count==3 \ +| eval duration_secs=mvindex(split(mvindex(combined,0),"!"),1) \ +| eval output_count=mvindex(split(mvindex(combined,2),"!"),1) ``` this appears to be bytes rather than event counts ``` \ +```| eval invocations=mvindex(split(mvindex(combined,1),"!"),1)``` \ +| eval output_per_duration=output_count/duration_secs \ +| fields - combined, combined_count + +[SearchHeadLevel - Jobs endpoint example] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report provides fields returned by the jobs REST endpoint. Some of this data is captured by index=_introspection sourcetype=search_telemetry which is easier to use...(and indexed, except for some edge cases around scheduled searches and hitting the limits.conf [search] max_count limit) +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/search/jobs \ +| eval publishedEpoch=strptime(published, "%Y-%m-%dT%H:%M:%S.%3N%:z") \ +| where publishedEpoch > relative_time(now(),"-10m") \ +| table published, ttl, title, splunk_server, statusBuckets, sid, searchTotalBucketsCount, searchEarliestTime, searchLatestTime, scanCount, runDuration, resultCount, provenance, priority, pid, optimizedSearch, earliestTime, latestTime, label, isBatchModeSearch, isDone, eventCount, eai:acl.app, doneProgress, dispatchState, diskUsage, delegate, author + +[SearchHeadLevel - configtracker index example] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As seen on community slack, posted by yuanliu. This query provides a summary of changes using the _configtracker index. This example uses the ui-prefs.conf file. Also refer to SearchHeadLevel - configtracker index example2 for a per-host version +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_configtracker data.path=*/ui-prefs.conf sourcetype=splunk_configuration_change \ +| spath path=data.changes{} \ +| fields - data.changes{}.* \ +| mvexpand data.changes{} \ +| spath input=data.changes{} \ +| spath input=data.changes{} path=properties{} \ +| mvexpand properties{} \ +| fields - properties{}.* \ +| spath input=properties{} \ +| table data.action data.path name *_value stanza + +[MonitoringConsole - one or more servers require configuration automated] +alert.suppress = 0 +alert.track = 1 +alert.digest_mode = 1 +alert.severity = 2 +counttype = number of events +cron_schedule = 57 * * * * +description = Chance the alert requires action? Low. This alert relies on the Webtools Add-on, https://splunkbase.splunk.com/app/4146. Note this alert uses a lookup in the monitoring console app, so you will either need to share the lookup globally or move the alert into the required app. This alert is designed to find a situation where the monitoring console has new servers in the "New" state and not "Configured", until the Apply changes button is pressed in Settings -> General Setup the said servers will not be part of the searches (and therefore your search results will not see all indexers if you added new indexers into the cluster). The automated part of this alert attempts to apply the settings for new indexers only. The alert MonitoringConsole - one or more servers require configuration can be used to alert if manual intervention is required. +dispatch.earliest_time = -4h@h +dispatch.latest_time = now +display.general.type = statistics +display.page.search.tab = statistics +display.visualizations.charting.chart = area +enableSched = 1 +quantity = 0 +relation = greater than +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest splunk_server=local "/services/search/distributed/peers?output_mode=json&search=disabled%3D0&search=status%3DUp" f=name \ +| join name type=outer \ + [ | makeresults ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command, and the | makeresults ``` \ +| curl splunkauth=true uri="https://localhost:8089/servicesNS/nobody/splunk_monitoring_console/configs/conf-splunk_monitoring_console_assets/settings?output_mode=json"\ + ``` Partially based on the examples on https://gist.github.com/nmattam/bcfbc8a4ebd9a520c2ac50ab0137e58f ``` \ +| spath input=curl_message path="entry{}[0].content.configuredPeers" output=configuredPeers\ + ``` add the new peers and assume that the assumed configuration is accurate, but first dedup in case we have added duplicates through this process ``` \ +| makemv configuredPeers delim="," \ + | table configuredPeers \ + | mvexpand configuredPeers \ + | eval fromjoin="true" \ + | rename configuredPeers AS name ]\ +| where isnull(fromjoin)\ +| table peerType, name, cluster_label, search_groups, status, status_details \ +| require \ +| stats values(name) AS names \ +| eval names=mvjoin(names,",")\ +``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ +| curl splunkauth=true uri="https://localhost:8089/servicesNS/nobody/splunk_monitoring_console/configs/conf-splunk_monitoring_console_assets/settings?output_mode=json"\ +``` \ + ``` Partially based on the examples on https://gist.github.com/nmattam/bcfbc8a4ebd9a520c2ac50ab0137e58f ``` \ +| spath input=curl_message path="entry{}[0].content.configuredPeers" output=configuredPeers\ + ``` add the new peers and assume that the assumed configuration is accurate, but first dedup in case we have added duplicates through this process ``` \ +| makemv configuredPeers delim="," \ +| table configuredPeers, names\ +| mvexpand configuredPeers\ +``` handle a case where we have IP's of servers that have chnaged in the configuredPeers settings ```\ +| search NOT ([ | rest splunk_server=local "/services/search/distributed/peers?output_mode=json&search=disabled%3D0&search=status%3DUp" \ +| lookup dmc_assets peerURI AS name OUTPUT serverName\ +| lookup dmc_assets serverName AS host OUTPUT peerURI \ +| where isnull(serverName)\ +| eval peerURI=mvdedup(peerURI)\ +| rename peerURI AS configuredPeers\ +| return 999 configuredPeers ])\ +| stats values(configuredPeers) AS configuredPeers, values(names) AS names\ +| eval count=mvcount(configuredPeers) \ +| eval configuredPeers=mvdedup(configuredPeers) \ +| eval configuredPeers=mvjoin(configuredPeers,",")\ +| eval names=mvjoin(names,",")\ +| eval data="configuredPeers=" . configuredPeers . "," . names\ +``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ +| curl splunkauth=true method=post uri="https://localhost:8089/servicesNS/nobody/splunk_monitoring_console/configs/conf-splunk_monitoring_console_assets/settings" datafield=data \ +``` \ + ``` we have updated the console assets settings, however we also must update the distributed groups... ``` \ +| appendpipe\ + [| rest splunk_server=local "/services/search/distributed/peers?output_mode=json&search=disabled%3D0&search=status%3DUp" \ + | lookup dmc_assets host OUTPUT serverName \ + ``` in some cases the serverName can be the short/non-FQDN name ``` \ + | lookup dmc_assets serverName AS host OUTPUTNEW serverName \ + ``` if the IP doesn't match it may not be configured as expected ``` \ + | lookup dmc_assets peerURI AS name OUTPUT serverName AS serverName2\ + | where isnull(serverName) OR isnull(serverName2) \ + | table cluster_label, name \ + | where isnotnull(cluster_label)\ + ``` | require```\ + | stats list(name) AS names by cluster_label\ + | eval uri="https://localhost:8089/services/search/distributed/groups/dmc_indexerclustergroup_" . cluster_label . "?output_mode=json" \ + ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ + | curl urifield=uri splunkauth=true \ + ``` \ + | spath input=curl_message path="entry{}[0].content.member{}" output=member \ + | makemv names delim=" " \ + | eval member=mvappend(member,names) \ + | rex field=member mode=sed "s/(.*)/\"\1\"/g" \ + | eval member_comb = mvjoin(member, ", ") \ + | eval data="{\"member\": [" . member_comb . "] }" \ + | eval uri="https://localhost:8089/services/search/distributed/groups/dmc_indexerclustergroup_" . cluster_label . "/edit" \ + ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ + | curl urifield=uri splunkauth=true datafield=data method=POST\ + ``` \ + ``` after updating the indexer cluster group for the required label we must additionally update the dmc group for indexers ``` \ + | eval uri="https://localhost:8089/services/search/distributed/groups/dmc_group_indexer?output_mode=json" \ + ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ + | curl urifield=uri splunkauth=true \ + ``` \ + | spath input=curl_message path="entry{}[0].content.member{}" output=member \ + | makemv names delim=" " \ + | eval member=mvappend(member,names) \ + | rex field=member mode=sed "s/(.*)/\"\1\"/g" \ + | eval member_comb = mvjoin(member, ", ") \ + | eval data="{\"member\": [" . member_comb . "] }" \ + | eval uri="https://localhost:8089/services/search/distributed/groups/dmc_group_indexer/edit" \ + ``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command \ + | curl urifield=uri splunkauth=true datafield=data method=POST\ + ``` \ + ] \ +| eval data="trigger_actions=1" \ + ``` 2023-12-12 , I've submitted docs feedback that this endpoint is undocumented but used by the MC, it appears to trigger the actions related to the savedsearch to populate a lookup ``` \ +``` remove the comments once you have Webtools Add-on, https://splunkbase.splunk.com/app/4146 installed to use the curl command +| curl splunkauth=true method=post uri="https://localhost:8089/servicesNS/nobody/splunk_monitoring_console/saved/searches/DMC+Asset+-+Build+Full/dispatch" datafield=data \ +``` +disabled = 1 + +[SearchHeadLevel - macros in use] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. Determine the macros in use per search head cluster +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit `searchheadhosts` TERM(info=completed) search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex field=search mode=sed "s/\n/ /g" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +| eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search)\ +| rex field=search "`\s*(?P<macro>.*?)\s*`" max_match=0\ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +| stats count by macro, search_head_cluster\ +| rex field=macro "(?P<commas>,)" max_match=0\ +``` for those who like to do () on the end of a macro call ```\ +| rex mode=sed field=macro "s/\(\)$//g"\ +| eval comma_count=mvcount(commas)\ +| fillnull comma_count value=0 \ +| eval args=if(match(macro, "\("),"true",null())\ +| eval comma_count=if(isnotnull(args),comma_count + 1,null()) \ +``` remove the arguments after the macro if they exist ```\ +| rex field=macro mode=sed "s/\(.*?\)$//"\ +| eval macro_name=if(isnull(comma_count),macro,macro . "(" . comma_count . ")")\ +``` filter any macros that are include the " symbol or | symbol, as these are unlikely to be a macro ```\ +| regex macro!="(\"|\||^$|\[|\?)"\ +| stats count by macro_name, search_head_cluster\ +``` filter out empty macros ```\ +| search macro_name!="" + +[SearchHeadLevel - indexes per savedsearch] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. Determine the indexes in savedsearches, you could use macro substitution to find more accurate results (as per the "SearchHeadLevel - Search Queries summary exact match" search, but this one is just a simple example). The goal is only to check scheduled savedsearches in this example. +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches f=next_scheduled_time f=search f=qualifiedSearch count=0 search="disabled=0" search="is_scheduled=1" f=eai:acl* `splunkadmins_restmacro` timeout=900 \ +| rex field=qualifiedSearch mode=sed "s/```.*?```/ /g" \ +| regex qualifiedSearch="^\s*(\|?)\s*(search|tstats|mstats|mcatalog|multisearch|union|set|summarize|datamodel|from\s*:?\s*datamodel|datamodelsimple)\s+" \ +| rex field=search max_match=50 "(?s)\|?\s*(union|set|multisearch)\s+(?P<part1>\[.*?\](\s*\[.*?\])+\s*(`[^`]+`\s*)*(\||$|',\s+))" \ +| rex field=part1 max_match=50 "(?s).*?\[(?P<subsearch>.*?)\]\s*(\||$|)" \ +| rex field=search max_match=50 "(?s)\|?\s*(map)\s+(maxsearches\s*=\s*\d+)?\s*search\s*=\s*\"(?P<subsearch>.*?)\"\s*(\||$)" \ +| rex field=search "(?s)^(?P<prepipe>\s*\|?([^\|]+))" \ +| rex field=subsearch "(?s)^\s*\|?(?P<prepipe_subsearch>([^\|]+))" \ +| nomv prepipe_subsearch \ +| fillnull prepipe_subsearch value=" " \ +| eval prepipe = prepipe . " " . prepipe_subsearch \ +| rex field=prepipe "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=prepipe "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| makemv delim=" " indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| eval count=mvcount(indexes) \ +| rename eai:acl.app AS app, eai:acl.owner AS owner, eai:acl.sharing AS sharing \ +| table title, app, indexes, count, owner, sharing, updated + +[SearchHeadLevel - platform_stats.remote_searches metrics populating search 24 hour] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. Metrics? Yes. This summary (mcollect) search attempts to find stats via the remote_searches.log on the indexing tier (useful if you do not have audit logs for all search heads) (note realtime_schedule = 0). Note: tested on 7.3 only, may not work on earlier versions +dispatch.earliest_time = -24h-5m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_internal `indexerhosts` sourcetype=splunkd_remote_searches source="/opt/splunk/var/log/splunk/remote_searches.log" terminated: OR closed: ```Note that keyword starting has the apiStartTime, apiEndTime stats, but lacks the useful stats from a search that is complete. Also note that on indexers scan_count=events_count (in my testing). Finally the elapsedTime sometimes failed to auto-extract, perhaps due to length...```\ +| regex search!="^(pretypeahead|copybuckets)"\ +| regex search!="^presummarize (tstats=t maintain=\"\" summaryprefix=\"[^\"]+\"|maintain=\"%22SUMMARY_ID%22%2C%22EARLIEST_TIME%22%2C%22REMOTE_SEARCH%22%2C%22NORM_SUMMARY_ID%22%2C%22NORM_REMOTE_SEARCH%22%0A\" summaryprefix=\"[^\"]+\")\s*$"\ +| rex "(terminated|closed): search_id=(?P<search_id>[^,]+)"\ +| eval indexer_cluster=`indexer_cluster_name(host)`\ +| stats dc(search_id) AS platform_stats.remote_searches.search_count_24hour by indexer_cluster\ +| addinfo\ +| rename info_max_time AS _time\ +| fields - info_* + +[IndexerLevel - events per second benchmark] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This is an example search for using the search_telemetry to attempt to measure (very roughly) the events/second returned by the indexing tier. This appears to be the best mesaurement I can find so far...Note you would need a list of savedsearches instead of example +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_introspection sourcetype=search_telemetry `searchheadhosts` `splunkadmins_events_per_second`\ +| table _time, perf.scan_count, phases.phase_0.elapsed_time_aggregations.avg, perf.search_runtime_secs, desc.savedsearch_name\ +| rename perf.scan_count AS scan_count, phases.phase_0.elapsed_time_aggregations.avg AS indexer_avg, perf.search_runtime_secs AS search_runtime, desc.savedsearch_name AS savedsearch_name\ +``` note that if scan_count exceeds 500K the stats disappear in 9.0.3/9.1.2/9.2.2, which is frustrating. Ad-hoc searches do not have this issue ```\ +| eval events_per_second=round(scan_count/indexer_avg,2)\ +| eval indexer_avg=round(indexer_avg,2), search_runtime=round(search_runtime,2)\ +| fillnull indexer_avg events_per_second + +[IndexerLevel - savedsearches by indexer execution time] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This search helps find savedsearches that are running for a certain period of time at the indexing tier +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_introspection sourcetype=search_telemetry `searchheadhosts`\ +| eval search_head=host\ +| eval search_head_cluster=`search_head_cluster`\ +| stats perc95(phases.phase_0.elapsed_time_aggregations.avg) AS indexer_runtime, values(host) AS hosts by desc.savedsearch_name, search_head_cluster\ +| where indexer_runtime>10 AND NOT match('desc.savedsearch_name', "^_ACCELERATE")\ +| sort indexer_runtime\ +| streamstats count by search_head_cluster\ +| where count<50\ +| sort - indexer_runtime \ +| eval hosts=mvjoin(hosts,",")\ +| eval example_search="| savedsearch \"SearchHeadLevel - Indexes for savedsearch without subsearches\" savedsearch_name=\"" . 'desc.savedsearch_name' . "\" host=\"" . hosts . "\"" + +[SearchHeadLevel - Indexes for savedsearch without subsearches] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This search provides a list of indexes if there are no subsearches in use...which can be useful for benchmarking/comparing performance over time. Note this does not do macro substituation (or eventtypes/tags but that is possible with the other examples) +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit savedsearch_name="$savedsearch_name$" host IN ($host$) \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +| regex search="^\s*(\|?)\s*(search|tstats|mstats|mcatalog|multisearch|union|set|summarize|datamodel|from\s*:?\s*datamodel|datamodelsimple)\s+" \ +| regex search!="(\||^)\s*(append|union|multisearch|set|appendcols|appendpipe|join|map)" \ +| rex field=search "(?s)^(?P<prepipe>\s*\|?([^\|]+))" \ +| rex field=prepipe "(?s)(NOT\s+index(\s*=\s*|::)[^ ]+)|(NOT\s+\([^\)]+\))|(index(\s*=\s*|::)\"?(?P<indexregex>[\*A-Za-z0-9-_]+))" max_match=50 \ +| rex field=prepipe "(?s)(NOT\s+index\s+[iI][nN]\s*\([^\)]+)|(index\s+[iI][nN]\s*\((?P<indexin>([^\)\"]+)|\"[^\)\"]+\"))" max_match=50 \ +| makemv delim="," indexin \ +| eval indexes=mvappend(indexregex,indexin) \ +| eval indexes=if(isnotnull(esstylewildcard),mvfilter(NOT match(indexes,"^_?\*$")),indexes) \ +| eval wildcard=mvfilter(match(indexes,"\*")) \ +| where isnull(wildcard) \ +| eval indexes=mvmap(indexes, replace(lower(indexes), "\"", "")) \ +| eval indexes=mvmap(indexes, trim(replace(indexes, "'", ""))) \ +| eval indexes=mvdedup(indexes) \ +| eval count=mvcount(indexes) \ +```| where count==1 \ +| search indexes!=_* ```\ +| stats count by indexes + +[SearchHeadLevel - Lookup definitions with no lookup file or kvstore collection] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This search attempts to find lookup definitions in all applications that refer to kvstore collections or lookup files that no longer exist. Therefore these lookup definitions may be deleted at this point. +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/data/lookup-table-files splunk_server=local timeout=900 count=0 f=title f=eai:* ``` this must be the first command as it auto-finialized after 60 seconds ``` \ +| table eai:acl.app title \ +| rename title AS filename \ +| append \ + [| rest /servicesNS/-/-/data/transforms/lookups splunk_server=local timeout=900 count=0 search=eai:acl.removable=1 f=eai:* f=type f=filename f=collection ] \ +| eventstats list(filename) AS files by eai:acl.app, filename \ +| eventstats list(filename) AS globalfiles by filename, eai:acl.sharing \ +| where isnotnull(title) \ +| where isnull(files) OR mvcount(files)=1 AND NOT (isnotnull(globalfiles) AND 'eai:acl.sharing'="global") \ +| eval transform=title \ +``` exclude automatic lookups as we don't want to remove definitions from lookups in use ``` \ +| search NOT [| rest "/servicesNS/-/-/data/props/lookups" count=0 timeout=900 splunk_server=local f=transform f=eai:* | table transform eai:acl.app ] \ +| join type=outer eai:acl.app collection [ | rest "/servicesNS/-/-/storage/collections/config" count=0 timeout=900 splunk_server=local f=eai:* | table eai:acl.app title | rename title AS collection | eval lookup_found="true" ] \ +| where isnull(lookup_found) \ +| rename eai:acl.app AS app, title AS lookup_definition_name, author AS owner, eai:acl.sharing AS sharing \ +| table app, lookup_definition_name, owner, sharing, type, updated + +[SearchHeadLevel - User created kvstore collections] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This search attempts to find any kvstore collection that can be removed as part of a cleanup of existing kvstores. To do this all splunkbase apps are excluded along with any automatic lookup references to kvstores +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest "/servicesNS/-/-/storage/collections/config" count=0 timeout=900 splunk_server=local f=eai:* search=eai:acl.removable=1 \ + ``` exclude all splunkbase apps ``` \ +| search \ + [| rest "/services/apps/local" count=0 timeout=900 splunk_server=local search=core=0 f=details \ + | search NOT details=https://apps.splunk.com* \ + | rename title AS eai:acl.app \ + | table eai:acl.app ] \ + ``` addtionally exclude anything used by automatic lookups ``` \ +| search NOT ([| rest "/servicesNS/-/-/data/props/lookups" count=0 timeout=900 splunk_server=local f=transform f=eai:* f=stanza \ + | join transform \ + [| rest /servicesNS/-/-/data/transforms/lookups splunk_server=local search=type=kvstore f=collection f=eai:* timeout=900 count=0 \ + | rename title AS transform ] \ + | rename transform AS title | table title ] ) \ +| join type=outer collection [ | rest /servicesNS/-/-/data/transforms/lookups splunk_server=local search=type=kvstore f=collection f=eai:* timeout=900 count=0 ] \ +| rename eai:acl.app AS app, eai:acl.sharing AS sharing, author AS owner, title AS lookup_definition_name \ +| table collection, owner, app, sharing, owner, lookup_definition_name + +[SearchHeadLevel - Search Queries summary loadjob and savedsearch usage in audit logs] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 5 * * * +description = Report only? Yes. This search attempts to search the audit logs to find any use of | loadjob of | savedsearch within the audit logs. macro substitution is not used but could be included (although I'm unsure how often someone calls a | savedsearch or | loadjob via a macro) +dispatch.earliest_time = -24h-5m@ +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +realtime_schedule = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_audit "info=completed" search_id!="'SummaryDirector_*" search_id!="'rsa_*" search_id!="'RemoteStorageRetrieveBuckets_*" search_id!="'RemoteStorageRetrieveIndexes*" search_id!="'ta_*" \ +| rex "(?s), search='(?P<search>.*)\]$" \ +| rex field=search mode=sed "s/\n/ /g" \ +| rex field=search mode=sed "s/```.*?```/ /g" \ +| eval search=if(substr(search,len(search),len(search)-1)=="'",substr(search,0,len(search)-1),search) \ +| eval search_id=replace(search_id,"'","") \ +| regex search="\|\s*loadjob\s*savedsearch=|\|\s*savedsearch" \ +| rex field=search "\|\s*savedsearch\s+(\"(?P<identified_savedsearch_name>[^\"']+)\"|(?P<identified_savedsearch_name2>[^ ']+))" \ +| rex field=search "\|\s*loadjob savedsearch=\"[^:]+:[^:]+:(?P<identified_savedsearch_name3>[^\"]+)" \ +| eval identified_savedsearch_name=coalesce(identified_savedsearch_name,identified_savedsearch_name2,identified_savedsearch_name3) \ +| where isnotnull(identified_savedsearch_name)\ +| search NOT identified_savedsearch_name IN ("instrumentation.topology*", "instrumentation.usage*", "instrumentation.upgrade*", "instrumentation.deployment*", "instrumentation.performance*", "instrumentation.app*", "instrumentation.licensing*", "instrumentation.authentication*")\ +| eval method=if(isnull(identified_savedsearch_name3),"savedsearch","loadjob")\ +| eval search_head=host \ +| eval env=`search_head_cluster`\ +| stats values(savedsearch_name) AS calling_savedsearch_name by _time, user, provenance, mode, app, identified_savedsearch_name, env, method + +[SearchHeadLevel - configtracker index example2] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. As seen on community slack, posted by Martin Mueller. This query provides a summary of changes using the _configtracker index for a particular host +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = index=_configtracker host=example \ +| table _time _raw \ +| spath data.path \ +| spath data.action \ +| spath data.changes{} output=changes \ +| fields - _raw \ +| mvexpand changes \ +| spath input=changes stanza \ +| spath input=changes properties{} output=properties \ +| fields - changes \ +| mvexpand properties \ +| spath input=properties \ +| fields - properties \ +| sort - _time \ +| transaction maxspan=5s data.path stanza name \ +| fields - _raw field_match_sum linecount closed_txn duration \ +| where NOT new_value=old_value + +[SearchHeadLevel - Job performance data per indexer handoff time] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report attempts to report the job performance per indexer of any particular SID related to handoff time. This is useful to determine if indexers which are not optimised out of the search are causing performance issues for queries. This is visible in the job inspector once the setting is enabled (no restart required). You will need to change this to the correct sid, and you will need to enable this setting (commented to avoid appinspect issues): \ +#[search_metrics] \ +#debug_metrics=true \ +in the limits.conf file +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /services/search/jobs/1727754595.5060908_ACE59D32-AF45-4CE5-B669-820A02A35151 \ +``` this is useful when limits.conf on the search head has: \ +[search_metrics] \ +debug_metrics=true \ +\ +No restart required``` \ +| fields *handoff.duration_secs \ +| untable perf field value \ +| rex field=field "performance\.phase_0\.(?<indexer>.*)\.(?P<type>.*)" \ +| stats max(value) AS value by indexer \ +| sort - value + +[SearchHeadLevel - KVStore collection size] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This shows the size of each kvstore collection on a search head and the count of entries +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest `splunkadmins_restmacro` /servicesNS/-/-/admin/kvstore-collectionstats \ +| table data \ +| mvexpand data \ +| spath input=data \ +| table ns size count \ +| rename ns as name \ +| eval sizeInMB=round(size/1024/1024) \ +| sort - sizeInMB + +[SearchHeadLevel - Savedsearches with schedules and no next_scheduled_time] +action.email.useNSSubject = 1 +alert.track = 0 +cron_schedule = 38 * * * * +description = Report only? Yes. This report lists searches that are configured with a schedule but cannot run due to the owner no longer existing or lack of permissions on the account itself +dispatch.earliest_time = -65m@m +dispatch.latest_time = -5m@m +display.events.fields = ["index","sourcetype","host"] +display.general.type = statistics +enableSched = 0 +request.ui_dispatch_app = SplunkAdmins +request.ui_dispatch_view = search +search = | rest /servicesNS/-/-/saved/searches count=0 search="disabled=0" search="is_scheduled=1" f=next_scheduled_time `splunkadmins_restmacro` f=title f=eai:* \ +| search next_scheduled_time="" \ +| table author, eai:acl.app, , title, next_scheduled_time diff --git a/apps/SplunkAdmins/default/transforms.conf b/apps/SplunkAdmins/default/transforms.conf new file mode 100644 index 00000000..14453434 --- /dev/null +++ b/apps/SplunkAdmins/default/transforms.conf @@ -0,0 +1,85 @@ +[setNull] +REGEX = . +DEST_KEY = queue +FORMAT = nullQueue + +[setError] +REGEX = ^[01]\d-[0-3]\d-20\d\d \d{2}:\d{2}:\d{2}.\d{3}\s+ERROR\s+ +DEST_KEY = queue +FORMAT = indexQueue + +[setAutoFinalize] +REGEX = Search auto-finalized after +DEST_KEY = queue +FORMAT = indexQueue + +#Only include warning or error entries +[setWARNorERROR] +REGEX = ,(?:ERROR|WARN), +DEST_KEY = queue +FORMAT = indexQueue + +[splunkadmins_macros] +#This config failed below with ERROR KVStoreLookup - KV Store output failed with err: The provided query was invalid. (Document may not contain '$' or '.' in keys.) message: +#Switching back to csv files for now +#collection = splunkadmins_macros +#external_type = kvstore +#fields_list = definition, eai:acl.app, title +batch_index_query = 0 +case_sensitive_match = 1 +collection = +external_type = +fields_list = +filename = splunkadmins_macros.csv + +[splunkadmins_userlist_indexinfo] +collection = splunkadmins_userlist_indexinfo +#external_type = kvstore +#fields_list = srchIndexesAllowed, srchIndexesDefault, user +filename = splunkadmins_userlist_indexinfo.csv + +[splunkadmins_indexlist] +batch_index_query = 0 +case_sensitive_match = 1 +filename = splunkadmins_indexlist.csv + +[splunkadmins_indexes_per_role] +batch_index_query = 0 +case_sensitive_match = 1 +filename = splunkadmins_indexes_per_role.csv + +[splunkadmins_datamodels] +batch_index_query = 0 +case_sensitive_match = 0 +filename = splunkadmins_datamodels.csv + +[splunkadmins_tags] +batch_index_query = 0 +case_sensitive_match = 0 +filename = splunkadmins_tags.csv + +[splunkadmins_eventtypes] +batch_index_query = 0 +case_sensitive_match = 0 +filename = splunkadmins_eventtypes.csv + +[splunkadmins_rmd5_to_savedsearchname] +batch_index_query = 0 +case_sensitive_match = 0 +filename = splunkadmins_rmd5_to_savedsearchname.csv + +[splunkadmins_indexlist_by_cluster] +batch_index_query = 0 +case_sensitive_match = 1 +filename = splunkadmins_indexlist_by_cluster.csv + +#Note that the lookup splunkadmins_hec_reply_code_lookup is based on https://github.com/redvelociraptor/gettingsmarter/blob/main/dashboards/hec_reply_codes.csv (previously https://docs.splunk.com/Documentation/Splunk/latest/Data/TroubleshootHTTPEventCollector) and this may change over time +[splunkadmins_hec_reply_code_lookup] +batch_index_query = 0 +case_sensitive_match = 1 +filename = splunkadmins_hec_reply_code_lookup.csv + +[splunkadmins_lookupfile_owners] +batch_index_query = 0 +case_sensitive_match = 1 +filename = splunkadmins_lookupfile_owners.csv diff --git a/apps/SplunkAdmins/lookups/splunkadmins_datamodels.csv b/apps/SplunkAdmins/lookups/splunkadmins_datamodels.csv new file mode 100644 index 00000000..9ac87f79 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_datamodels.csv @@ -0,0 +1 @@ +datamodel,sharing,app,definition,splunk_server diff --git a/apps/SplunkAdmins/lookups/splunkadmins_eventtypes.csv b/apps/SplunkAdmins/lookups/splunkadmins_eventtypes.csv new file mode 100644 index 00000000..965678fd --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_eventtypes.csv @@ -0,0 +1 @@ +eventtype,definition,app,sharing,"splunk_server" diff --git a/apps/SplunkAdmins/lookups/splunkadmins_hec_reply_code_lookup.csv b/apps/SplunkAdmins/lookups/splunkadmins_hec_reply_code_lookup.csv new file mode 100644 index 00000000..d1beb8b7 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_hec_reply_code_lookup.csv @@ -0,0 +1,24 @@ +status_code,http_status_code_id,http_status_code_text,status_message,reason,action +0,200,OK,Success,, +1,403,Forbidden,Token disabled,Client is sending using a disabled token,Splunk Admin needs to enable the token or have client use new token. +2,401,Unauthorized,Token is required,Client is sending without a token,Splunk Admin needs to find what client is trying to send without a token. +3,401,Unauthorized,Invalid authorization,Client is sending with an incorrect Authorization Header,"Splunk Admin needs to work with client/user to ensure Authorization Header is correct, most common cause is the word Splunk is missing before the token." +4,403,Forbidden,Invalid token,Client is sending with a token the receiver(s) don't know of,Splunk Admin needs to work with client/user to ensure they are using a valid token. +5,400,Bad Request,No data,Client is sending without any data,Splunk Admin needs to work with client/user to ensure sending side is configured to send data properly. If there is a token and a channel ID with no payload this is more than likely AWS Firehose second connection to ensure it can send data to Splunk. It's testing the event endpoint and it's expecting to get a 400 reply code. +6,400,Bad Request,Invalid data format,Client is sending with data in an invalid format,"Splunk Admin needs to work with client/user to ensure sending side is using a proper format, the raw source should be looked at and the log entry for parsing_err will point to what to look for. In Splunk versions newer then 8.1.2103, as a last resort debug can be used." +7,400,Bad Request,Incorrect index,Client is trying to send to an index not in the tokens allow list,Splunk Admin needs to work with client/user to ensure the sending side is trying to send to indexes listed in the tokens allow list. Correction can be on the client sending side or adding the index to the token in Splunk. +8,500,Internal Error,Internal server error,Receiver had an issue client should retry to send,Client should automatically try to resend the data. If the issue happens too often then a support case should be filed so that the issue can be investigated deeper. +9,503,Service Unavailable,Server is busy,Receiver had an issue receiving client should retry to send,"Client should automatically try to resend data, occasional Server Is Busy messages are expected. If the message happens too often a support case should be filed and investigated further." +10,400,Bad Request,Data channel is missing,Client is trying to send to a token that has useACK enabled channel id is needed,Splunk Admin needs to work with client/user to ensure they are using the correct token and the sending side is configured properly. +11,400,Bad Request,Invalid data channel,Client is trying to send with an improperly formatted data channel id,Splunk Admin needs to work with the client/user to ensure they send using a properly formatted data channel id. +12,400,Bad Request,Event field is required,Client is trying to send without an event field,Splunk Admin needs to work with the client/user to ensure they are sending in a proper format. An event field is not being sent. +13,400,Bad Request,Event field cannot be blank,Client is trying to send with an empty event field,Splunk Admin needs to work with the client/user to ensure they are sending in a proper format. The event field is empty. +14,400,Bad Request,ACK is disabled,Client is trying to use useACK on a token that it is not enabled on,Splunk Admin needs to work with the client/user to ensure they are using the correct token for their data in the proper format. +15,400,Bad Request,Error in handling indexed fields,Client is trying to send where index fields are incorrect,Splunk Admin needs to work with the client/user to ensure they are using index fields correctly for HEC. +16,400,Bad Request,Query string authorization is not enabled,Client is trying to send with query string authorization where it is not enabled,Splunk Admin needs to open a Support case to enable query string authorization to the token. Understand the security risk of Query string authorization. The HEC token can be logged in plain text as part of the url. +17,200,OK,HEC is healthy,, +18,503,Service Unavailable,"HEC is unhealthy, queues are full",Receiver Queues are full, +19,503,Service Unavailable,"HEC is unhealthy, ack service unavailable",, +20,503,Service Unavailable,"HEC is unhealthy, queues are full, ack service unavailable",, +21,400,Bad Request,Invalid token,, +22,400,Bad Request,Token disabled,, diff --git a/apps/SplunkAdmins/lookups/splunkadmins_indexes_per_role.csv b/apps/SplunkAdmins/lookups/splunkadmins_indexes_per_role.csv new file mode 100644 index 00000000..f21f58c0 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_indexes_per_role.csv @@ -0,0 +1 @@ +roles,"splunk_server",srchIndexesAllowed,srchIndexesDefault diff --git a/apps/SplunkAdmins/lookups/splunkadmins_indexlist.csv b/apps/SplunkAdmins/lookups/splunkadmins_indexlist.csv new file mode 100644 index 00000000..9015a7a3 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_indexlist.csv @@ -0,0 +1 @@ +index diff --git a/apps/SplunkAdmins/lookups/splunkadmins_indexlist_by_cluster.csv b/apps/SplunkAdmins/lookups/splunkadmins_indexlist_by_cluster.csv new file mode 100644 index 00000000..ceb99f3b --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_indexlist_by_cluster.csv @@ -0,0 +1 @@ +indexer_cluster,index diff --git a/apps/SplunkAdmins/lookups/splunkadmins_lookupfile_owners.csv b/apps/SplunkAdmins/lookups/splunkadmins_lookupfile_owners.csv new file mode 100644 index 00000000..15902827 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_lookupfile_owners.csv @@ -0,0 +1 @@ +lookup_file,app,owner,owner_from_definitio,sharing,lookup_name diff --git a/apps/SplunkAdmins/lookups/splunkadmins_macros.csv b/apps/SplunkAdmins/lookups/splunkadmins_macros.csv new file mode 100644 index 00000000..c03f66e4 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_macros.csv @@ -0,0 +1 @@ +title,app,"splunk_server",definition,sharing diff --git a/apps/SplunkAdmins/lookups/splunkadmins_rmd5_to_savedsearchname.csv b/apps/SplunkAdmins/lookups/splunkadmins_rmd5_to_savedsearchname.csv new file mode 100644 index 00000000..87c4024c --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_rmd5_to_savedsearchname.csv @@ -0,0 +1 @@ +RMDvalue,savedsearch_name diff --git a/apps/SplunkAdmins/lookups/splunkadmins_tags.csv b/apps/SplunkAdmins/lookups/splunkadmins_tags.csv new file mode 100644 index 00000000..ba5794a4 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_tags.csv @@ -0,0 +1 @@ +tag,definition,app,sharing,"splunk_server" diff --git a/apps/SplunkAdmins/lookups/splunkadmins_userlist_indexinfo.csv b/apps/SplunkAdmins/lookups/splunkadmins_userlist_indexinfo.csv new file mode 100644 index 00000000..b4f75ca8 --- /dev/null +++ b/apps/SplunkAdmins/lookups/splunkadmins_userlist_indexinfo.csv @@ -0,0 +1 @@ +srchIndexesAllowed,srchIndexesDefault,user diff --git a/apps/SplunkAdmins/metadata/default.meta b/apps/SplunkAdmins/metadata/default.meta new file mode 100644 index 00000000..6a01d0a5 --- /dev/null +++ b/apps/SplunkAdmins/metadata/default.meta @@ -0,0 +1,23 @@ +# Application-level permissions +[] +access = read : [ admin, sc_admin ], write : [ admin, sc_admin ] + +[eventtypes] +export = none + +[props] +export = none + +[transforms] +export = none + +[lookups] +export = none + +[tags] +export = none + +[viewstates] +access = read : [ * ], write : [ * ] +export = none + diff --git a/apps/SplunkAdmins/splunkbase.manifest b/apps/SplunkAdmins/splunkbase.manifest new file mode 100644 index 00000000..e4bb03be --- /dev/null +++ b/apps/SplunkAdmins/splunkbase.manifest @@ -0,0 +1,271 @@ +{ + "version": "1.0", + "date": "2024-11-18T21:26:24.560754613Z", + "hashAlgorithm": "SHA-256", + "app": { + "id": 3796, + "version": "4.0.1", + "files": [ + { + "path": "default/app.conf", + "hash": "b67935fa9e332c7889406e4380fea757f826bb863437e6329493669b9df562a1" + }, + { + "path": "default/data/ui/nav/default.xml", + "hash": "1864d3aeaeac7ee0c49c93e0e41609d67da7783a175638b08ed31f9a8b9f328d" + }, + { + "path": "default/data/ui/views/ClusterMasterJobs.xml", + "hash": "ace418e8530449f73e9d7d91f6e6f57002e234c43c5a04e22866bfaa525f7949" + }, + { + "path": "default/data/ui/views/data_model_rebuild_monitor.xml", + "hash": "10690251de7d55a3da368d0bed0e0acd90e56285118aa3fbd88bc344eabbab5e" + }, + { + "path": "default/data/ui/views/data_model_status.xml", + "hash": "f93feda0cbb8874bc40f7623bc3ae3ef40b68811e59b2d0384f3ae9ba3208e96" + }, + { + "path": "default/data/ui/views/detect_excessive_search_use.xml", + "hash": "fbf207e014b41b7f38c21904d8e6525d493aeda1a62c81dd8a67a9a5c4939d61" + }, + { + "path": "default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name.xml", + "hash": "d787e4eb2766616fb6a76fe7944c8510b013005556fe75355b878846bc87d227" + }, + { + "path": "default/data/ui/views/heavyforwarders_max_data_queue_sizes_by_name_v8.xml", + "hash": "71c203e028bcc17f9d13f52e1f001a68605304a50a6337bcd2daf9a571bddf0e" + }, + { + "path": "default/data/ui/views/heavy_forwarder_analysis.xml", + "hash": "ca4286507e38d2f08da1e989d6a1708b9c09e48ef18851a1daf60d1cafdb798d" + }, + { + "path": "default/data/ui/views/hec_performance.xml", + "hash": "dc166a30a81c9de437b8d2c922ad581b9ad988389df156a3fef66c8b2f4fa134" + }, + { + "path": "default/data/ui/views/indexer_data_spread.xml", + "hash": "a45ddeebf77d329b45be89be753426404587c896c41b0478b2d4068892d2f071" + }, + { + "path": "default/data/ui/views/indexer_max_data_queue_sizes_by_name.xml", + "hash": "065529820d0e080a9a75699afbd7631df5de42b25e22dbf6e4ed0949b7e77b4c" + }, + { + "path": "default/data/ui/views/indexer_max_data_queue_sizes_by_name_v8.xml", + "hash": "4faaf0efed77f607a154e079706ab5f763fae170dbe5d41da41e397ddeb77cf0" + }, + { + "path": "default/data/ui/views/issues_per_sourcetype.xml", + "hash": "fd1bdf2a18f159e6b2f8ff93c5c130419a5c2a7e1fc032926928199ee5ba237e" + }, + { + "path": "default/data/ui/views/knowledge_objects_by_app.xml", + "hash": "d0a7644a87608ac53508677dd52925a997846a7577eadb1f177281f0f63aa172" + }, + { + "path": "default/data/ui/views/knowledge_objects_by_app_drilldown.xml", + "hash": "cf74079b09ffe61312c4d00b7c51ed8635a025c79b3727814c8a4425876eaa1c" + }, + { + "path": "default/data/ui/views/lookups_in_use_finder.xml", + "hash": "c6e43d1b40b08e665553774e21fdf38f5286f550c2be5d5a2e6989052d800986" + }, + { + "path": "default/data/ui/views/lookup_audit.xml", + "hash": "208681df6d96087207518af6a948834c3211c92bb65de450ded41e0dea6a090a" + }, + { + "path": "default/data/ui/views/rolled_buckets_by_index.xml", + "hash": "f9b0bc4f1655ca5252fcaab05865f38bffdeb2c61fe3a09fa8212b7334ffbf0b" + }, + { + "path": "default/data/ui/views/search_head_scheduledsearches_distribution.xml", + "hash": "86997ae930ad8a7e505e86021efd6bc4b83abbf5675295b53213bc85be13766e" + }, + { + "path": "default/data/ui/views/smartstore_stats.xml", + "hash": "5c4a7f45ee75e2f4d3a219e961fe841a428f5cb5986030929d3e2b4483e8a04d" + }, + { + "path": "default/data/ui/views/splunk_forwarder_data_balance_tuning.xml", + "hash": "7b30fd2f4fd19ae94f6b5b6fa0bf87b1810ec4bf2a2085cd143ad31fa5bc9bac" + }, + { + "path": "default/data/ui/views/splunk_forwarder_output_tuning.xml", + "hash": "0a9233373d6919f6668aabf669c0491519bd3c040de34c2061990985602f4197" + }, + { + "path": "default/data/ui/views/splunk_introspection_io_stats.xml", + "hash": "f101b92f4725bcd91ce05a0c35484d8f496251609abc81e5ad89213625e833de" + }, + { + "path": "default/data/ui/views/troubleshooting_indexer_cpu.xml", + "hash": "5b438d0ec47779a0c9e97b294bda0dd528a52095046c5b16129d87f85a79b4ab" + }, + { + "path": "default/data/ui/views/troubleshooting_indexer_cpu_drilldown.xml", + "hash": "710a2e4a0f6d088b1ef1cab7db9944ad8aab0a6a3a4eef03e2d4d11c038469e5" + }, + { + "path": "default/data/ui/views/troubleshooting_resource_usage_per_user.xml", + "hash": "8ea42775ea292e9fd7801ea35d7fabb4874db2c07b9558790df0a5f41aea3f10" + }, + { + "path": "default/data/ui/views/troubleshooting_resource_usage_per_user_drilldown.xml", + "hash": "3de1321b80e17059ba468dfe076208b28b3d51772537dd037b087f342cca0104" + }, + { + "path": "default/macros.conf", + "hash": "dffbdc2e99dfa520f86eaa7339c75733b34ccce8aa99f88c909be616e699a657" + }, + { + "path": "default/props.conf", + "hash": "b422e5d7410919ac19476180e1383520830592fa3da27787fdeed0d5b7262bb5" + }, + { + "path": "default/savedsearches.conf", + "hash": "4abb46669e6728ac2d171489fb049e8275f815b140a778b2ea2250b25635467c" + }, + { + "path": "default/transforms.conf", + "hash": "6fc76fe50cd62a39018b22279535b7bb475a326ccf2db1b7fe1a2b3a378fa033" + }, + { + "path": "LICENSE", + "hash": "b40930bbcf80744c86c46a12bc9da056641d722716c378f5659b9e555ef833e1" + }, + { + "path": "lookups/splunkadmins_datamodels.csv", + "hash": "1d5f73c2170040fd111d3e64f095ffd808978030d9d3817d422b214eb82be636" + }, + { + "path": "lookups/splunkadmins_eventtypes.csv", + "hash": "4f308f3c824b105eace933f06b5a170fa86dbf98c5ba348e4a05522804b0dbec" + }, + { + "path": "lookups/splunkadmins_hec_reply_code_lookup.csv", + "hash": "9c4be11e9cfa465f5d8a38c3f1ba467d00191c14d0bd767a9fe0ab7a77196b79" + }, + { + "path": "lookups/splunkadmins_indexes_per_role.csv", + "hash": "39d43de1ef29a713ad480f668375d56fba195d57d49b77b190aed712f455fd55" + }, + { + "path": "lookups/splunkadmins_indexlist.csv", + "hash": "f816b480f87144ec4de5862adf028ff66cc6964250325d53fd22bf8922824b6f" + }, + { + "path": "lookups/splunkadmins_indexlist_by_cluster.csv", + "hash": "8d953cac7d4dbd8a1cd5aa3bb488710a6eeb5d49c0c43c27930328acdf9708f0" + }, + { + "path": "lookups/splunkadmins_lookupfile_owners.csv", + "hash": "ebda123bd9d0f791eaa177d8a7c6903e3f4b1df910c57218e3cc48ed99cdbcb4" + }, + { + "path": "lookups/splunkadmins_macros.csv", + "hash": "28ecbbdbe1641776141e78ca483e310565f3023f0a2a6a539e2dc0ee752824e2" + }, + { + "path": "lookups/splunkadmins_rmd5_to_savedsearchname.csv", + "hash": "64d62548bb0741d6f76fad5ab96c168307c68d5c98251139f2c9f6c6c0574024" + }, + { + "path": "lookups/splunkadmins_tags.csv", + "hash": "d7ea98b9397ddbedb9b6471fb6cdc86a76135388e823204fcd84d91826318dfc" + }, + { + "path": "lookups/splunkadmins_userlist_indexinfo.csv", + "hash": "d9e8eabd1d316bc60a6e351339b676bd6d7b53b914f01bf3c7858cf7c92716bc" + }, + { + "path": "metadata/default.meta", + "hash": "0838ba65305ef1ae0367d6bcc5c6fd63d0f59d8e6fdb66b6aed4ff14394c613c" + }, + { + "path": "NOTICE", + "hash": "11494ae88ef9a7d75cd70b4e2c3152bd83751665a3dde0590527857496ba5440" + }, + { + "path": "README.md", + "hash": "3da128fa717ba6929a0528ec9f91057656356de738cd6ff50c989f14f50efcd4" + }, + { + "path": "static/appIcon.png", + "hash": "32f1a6833f3a9db2f6d4dcac27404459f91bef4e2898604aa1ddc168455dbc1b" + }, + { + "path": "static/appIconAlt.png", + "hash": "32f1a6833f3a9db2f6d4dcac27404459f91bef4e2898604aa1ddc168455dbc1b" + }, + { + "path": "static/appIconAlt_2x.png", + "hash": "8caf40b544afaaa087d232c479560a0a3c2e57b27d0f8cb38f90ba48f53256c6" + }, + { + "path": "static/appIcon_2x.png", + "hash": "8caf40b544afaaa087d232c479560a0a3c2e57b27d0f8cb38f90ba48f53256c6" + }, + { + "path": "static/appLogo.png", + "hash": "ee7abc736a4b4cbbd796383f0dce484d4efe4b1be5dc309ff6730a14a92896a0" + }, + { + "path": "static/appLogo_2x.png", + "hash": "0b483b1aec1a6c70a98bd1a58fa31406b7d946ce9cfac3ac3ae296edc7fdce28" + } + ] + }, + "products": [ + { + "platform": "splunk", + "product": "enterprise", + "versions": [ + "8.1", + "8.2", + "9.0", + "9.1", + "9.2", + "9.3" + ], + "architectures": [ + "x86_64" + ], + "operatingSystems": [ + "windows", + "linux", + "macos", + "freebsd", + "solaris", + "aix" + ] + }, + { + "platform": "splunk", + "product": "cloud", + "versions": [ + "8.1", + "8.2", + "9.0", + "9.1", + "9.2", + "9.3" + ], + "architectures": [ + "x86_64" + ], + "operatingSystems": [ + "windows", + "linux", + "macos", + "freebsd", + "solaris", + "aix" + ] + } + ] +} \ No newline at end of file diff --git a/apps/SplunkAdmins/static/appIcon.png b/apps/SplunkAdmins/static/appIcon.png new file mode 100644 index 00000000..85f1b21e Binary files /dev/null and b/apps/SplunkAdmins/static/appIcon.png differ diff --git a/apps/SplunkAdmins/static/appIconAlt.png b/apps/SplunkAdmins/static/appIconAlt.png new file mode 100644 index 00000000..85f1b21e Binary files /dev/null and b/apps/SplunkAdmins/static/appIconAlt.png differ diff --git a/apps/SplunkAdmins/static/appIconAlt_2x.png b/apps/SplunkAdmins/static/appIconAlt_2x.png new file mode 100644 index 00000000..2dd19140 Binary files /dev/null and b/apps/SplunkAdmins/static/appIconAlt_2x.png differ diff --git a/apps/SplunkAdmins/static/appIcon_2x.png b/apps/SplunkAdmins/static/appIcon_2x.png new file mode 100644 index 00000000..2dd19140 Binary files /dev/null and b/apps/SplunkAdmins/static/appIcon_2x.png differ diff --git a/apps/SplunkAdmins/static/appLogo.png b/apps/SplunkAdmins/static/appLogo.png new file mode 100644 index 00000000..eb9f2445 Binary files /dev/null and b/apps/SplunkAdmins/static/appLogo.png differ diff --git a/apps/SplunkAdmins/static/appLogo_2x.png b/apps/SplunkAdmins/static/appLogo_2x.png new file mode 100644 index 00000000..7b659707 Binary files /dev/null and b/apps/SplunkAdmins/static/appLogo_2x.png differ