Course Content
- Analytics Framework
- Exploratory Data Analysis
- Regression for Prediction
- Cleaning and Preprocessing Data
- Algorithms, Preprocessing and Feature Extraction
- Clustering Data
- Detecting Anomalies
- Forecasting
- Classification
Prerequisites
To be successful, students should have working knowledge of the following courses or equivalent Splunk experience:
- What is Splunk? (Retired)
- Intro to Splunk (ITS)
- Using Fields (SUF)
- Scheduling Reports & Alerts (SRA)
- Visualizations (SVZ)
- Working with Time (WWT)
- Statistical Processing (SSP)
- Comparing Values (SCV)
- Result Modification (SRM)
- Leveraging Lookups and Subsearches (LLS)
- Correlation Analysis (SCLAS)
- Search Under the Hood (SUH)
- Intro to Knowledge Objects (IKO)
- Creating Field Extractions (CFE)
- Search Optimization (SSO)
Course Objectives
This 13.5-hour course is for users who want to attain operational intelligence level 4, (business insights) and covers implementing analytics and data science projects using Splunk's statistics, machine learning, built-in and custom visualization capabilities (Splunk Enterprise and Splunk Machine Learning Toolkit products are covered).
Please note that this course may run over three days, with 4.5 hour sessions each day.
Outline: Splunk for Analytics and Data Science (SADS)
Topic 1 – Analytics Workflow
- Define terms related to analytics and data science
- Describe the analytics workflow
- Describe common usage scenarios
- Navigate Splunk Machine Learning Toolkit
Topic 2 – Exploratory Data Analysis
- Describe the purpose of data exploration
- Identify SPL commands for data exploration
- Split data for testing and training using the sample command
Topic 3 – Predict Numeric Fields with Regression
- Differentiate predictions from estimates
- Identify prediction algorithms and assumptions
- Describe the fit and apply commands
- Model numeric predictions in the MLTK and Splunk Enterprise
- Use the score command to evaluate models
Topic 4 – Clean and Preprocess the Data
- Define preprocessing and describe its purpose
- Describe algorithms that preprocess data for use in models
- Choose relevant fields
- Reduce dimensionality
- Normalize data
- Preprocess text
Topic 5 – Cluster Data
- Define Clustering
- Identify clustering methods, algorithms, and use cases
- Use Smart Clustering Assistant to cluster data
- Evaluate clusters using silhouette score
- Validate cluster coherence
- Describe clustering best practices
Topic 6 – Anomaly Detection
- Define anomaly detection and outliers
- Identify anomaly detection use cases
- Use Splunk Machine Learning ToolKit Smart Outlier Assistant
- Detect anomalies using the Density Function algorithm
- Optimize anomaly detection with Local Outlier Factor
- View results with the Distribution Plot visualization
Topic 7 – Estimation and Prediction
- Differentiate predictions from forecasts
- Use the Smart Forecasting Assistant
- Use the StateSpaceForecast algorithm
- Forecast multivariate data
- Account for periodicity in each time series
Topic 8 – Classification
- Define key classification terms
- Use classification algorithms
- Evaluate classifier tradeoffs
- Evaluate results of multiple algorithms