Learn Azure Sentinel
上QQ阅读APP看书,第一时间看更新

Choosing data that matters

Quality data management is critical to the success of big data analytics, which is the core basis of how a SIEM solution works. Gathering data for analysis is required in order to find security threats and unusual behavior across a vast array of infrastructure and applications. However, there needs to be a balance between capturing every possible event from all available logs and not having enough data to really find the correlating activities. Too much data will increase the signal noise associated with alert fatigue and will increase the cost of the security solution to store and analyze the information, which, in this case, is Azure Log Analytics and Azure Sentinel, but it also applies to other SIEM solutions.

One of the recent shifts in the security data landscape is the introduction of multiple platforms that carry out log analysis locally and only forward relevant events on to the SIEM solution. Instead of duplicating the logs, hoping to fish relevant information from it by using a single security analysis tool (such as a SIEM solution), new security products are focused on gathering specific data and resolving threats within their own boundaries; examples include the following:

  • Identity and Access Management (IAM) for continuous analysis and condition-based access, per session
  • Endpoint Detection and Response (EDR) for detailed analysis on every host, with centralized analytics across devices for threat mapping and remediation
  • A cloud-access security broker (CASB) for use-behavior analytics across firewalls and external cloud-based solutions
  • A next-generation firewall (NGFW) for monitoring and responding to dynamic changes in behavior across internal- and external-facing networks

    Note

    Refer to Chapter 1, Getting Started with Azure Sentinel, for further details about each of these solutions.

Each of these solutions already gathers large volumes of data from their respective data sources; therefore, there is no need to duplicate the data in the SIEM log storage. Instead, these solutions can be integrated with the SIEM solution to only send relevant and actionable information, to enable the SIEM to act as the central point of communication for analysis, alerting, and ticketing. The net result is a reduction in duplication and overall solution cost. This idea is summarized in the following screenshot:

Figure 3.1 – Data for security operations

Figure 3.1 – Data for security operations

When dealing with large data volumes, we can use the 7 Vs of big data to guide our decisions on what is the right data to collect, based on the priorities assigned:

  • Volume: This directly impacts the cost of moving and storing the data.
  • Velocity: This impacts the time to respond to an event.
  • Variety: Are we including every aspect of apps and infrastructure? Where are the blackspots?
  • Variability: Is the information easy to understand and act upon?
  • Veracity: Do we trust the source and accuracy of the data?
  • Visualization: Can we use this data to create visualizations and comparisons?
  • Value: Consistently review the value of the data, reduce waste, and retain value.

Here is an example of how to use each of these values to prioritize and justify the data— for volume, instead of focusing on the volume of data, we need to focus on the quality and variety of the data to provide accurate and actionable information across multiple systems.

A summary of this topic is shown in the following screenshot:

Figure 3.2 – The 7 Vs of big data

Figure 3.2 – The 7 Vs of big data

You can use the chart shown in the preceding screenshot to make your initial assessment of the types of data you need to ingest into Azure Sentinel and that which can be excluded. We recommend you also review this periodically to ensure you are maintaining a healthy dataset, either by adding more data sources or tuning out some of the data that no longer meets the requirements (but costs to store and process).