Challenges of Windows Defender Advanced Threat Protection (ATP) in Azure IAAS
My current employer is an Azure-only shop - we run our entire business on the various Azure services. We also leverage a number of Azure's automation, orchestration, and management services, including in this case Azure Operations Management Suite (akin to MOM of past years). For this discussion, I'm talking about a service included in OMS called Windows Defender Advanced Threat Protection (ATP).
For OMS customers, particularly around the 2018-04 timeframe, Microsoft has enabled ATP by default. The ATP agent is deployed as part of the Microsoft Monitoring Agent (MMA) on Azure Infrastructure as a Service (IAAS) VMs. You can see if you have MMA installed on your IAAS VMs by logging into the Azure Portal, selecting Virtual Machines, selecting a VM that you're interested in, and selecting the Extensions blade.
If this MMA extension is installed on your VM, then chances are you have ATP. The challenge is that, in my experience following numerous long calls with Premiere Support, ATP instrumentation for Azure management is incomplete - and that's being generous. To effectively manage ATP, you need to be able to access the ATP Portal, part of Windows Defender Security Center. However, there's a Second Portal that is actually used to configure ATP. This second portal is a licensed feature! So, you have ATP whenever you pay for OMS, but you can't actually configure it until you apply additional dollars.
At first glance, this seems OK, genuine, perhaps even a gain. ATP does produce all sorts of great graphs and reports after all. It's Microsoft's Big Data-driven heuristics engine based on machine learning and all of the other buzzwords of knowledge. However, the default configuration is both invisible and stringent. In my case, I have 2 dozen NTFS over SMB file servers hosting about 100TB of data files - millions and millions of relatively small files, anywhere from a few bytes to a few 10s of MB.
Admittedly, I don't know what the policy is for scanning these files for threats - the portal keeps these parameters behind a paywall that we're not interested in today. What I do experience is a service - MSSenseS.exe - consuming CPU and holding locks on files. Fortunately, the file server experience isn't too tragic since these small files come and go easily with their small file sizes.
Where this becomes a much more prolific issue are backups, archives, and other kinds of large files. Twice in the past 3 months I've had a task of pulling many very large files - 500 or so files ranging from 5GB to 500GB - and compressing them for long term storage in Azure Blob. In my case, I'm pulling a file down, running 7z, then uploading the compressed file. Normally a very trivial operation aside from the time and compute necessary to pull this off. In comes ATP. Again, for reasons that aren't clear, after my file has completed compressing, MsSenseS.exe will begin scanning the file. The scanning performance is seemingly locked at 8MB/s, not due to any compute or IO restriction. Once this scanning event takes place, the file is also locked for access, preventing the follow-up delete once my script has completed uploading the newly archived file. When my file is 100GB, 8MB/s means that my 16 vCPU F16S VM that costs $1.53 per hour, is hamstrung for 3.4 hours testing vulnerabilities on a file I intended on deleting in the first place.
In my Powershell script, I call remove-item -path $myfilename -force. Until MsSenseS.exe releases the file, remove-item simply hangs indefinitely. Ideally, I would be able to simply define exclusions like any other anti-malware/anti-virus solution. As a data professional, I would exclude extensions for MSSQL common files, exclude MSSQL common paths, and let the security solution focus on the unknowns. Unfortunately, I'm not able to do this today with ATP, while at the same time not able to unenroll.
Support pointed us towards this difficult LINK. Given that we have nearly 1000 VMs in IAAS, and especially with consideration for the criticality of these hosts, ATP has presented us with issues numerous times. The only workarounds that we've been given:
For OMS customers, particularly around the 2018-04 timeframe, Microsoft has enabled ATP by default. The ATP agent is deployed as part of the Microsoft Monitoring Agent (MMA) on Azure Infrastructure as a Service (IAAS) VMs. You can see if you have MMA installed on your IAAS VMs by logging into the Azure Portal, selecting Virtual Machines, selecting a VM that you're interested in, and selecting the Extensions blade.
If this MMA extension is installed on your VM, then chances are you have ATP. The challenge is that, in my experience following numerous long calls with Premiere Support, ATP instrumentation for Azure management is incomplete - and that's being generous. To effectively manage ATP, you need to be able to access the ATP Portal, part of Windows Defender Security Center. However, there's a Second Portal that is actually used to configure ATP. This second portal is a licensed feature! So, you have ATP whenever you pay for OMS, but you can't actually configure it until you apply additional dollars.
At first glance, this seems OK, genuine, perhaps even a gain. ATP does produce all sorts of great graphs and reports after all. It's Microsoft's Big Data-driven heuristics engine based on machine learning and all of the other buzzwords of knowledge. However, the default configuration is both invisible and stringent. In my case, I have 2 dozen NTFS over SMB file servers hosting about 100TB of data files - millions and millions of relatively small files, anywhere from a few bytes to a few 10s of MB.
Admittedly, I don't know what the policy is for scanning these files for threats - the portal keeps these parameters behind a paywall that we're not interested in today. What I do experience is a service - MSSenseS.exe - consuming CPU and holding locks on files. Fortunately, the file server experience isn't too tragic since these small files come and go easily with their small file sizes.
Where this becomes a much more prolific issue are backups, archives, and other kinds of large files. Twice in the past 3 months I've had a task of pulling many very large files - 500 or so files ranging from 5GB to 500GB - and compressing them for long term storage in Azure Blob. In my case, I'm pulling a file down, running 7z, then uploading the compressed file. Normally a very trivial operation aside from the time and compute necessary to pull this off. In comes ATP. Again, for reasons that aren't clear, after my file has completed compressing, MsSenseS.exe will begin scanning the file. The scanning performance is seemingly locked at 8MB/s, not due to any compute or IO restriction. Once this scanning event takes place, the file is also locked for access, preventing the follow-up delete once my script has completed uploading the newly archived file. When my file is 100GB, 8MB/s means that my 16 vCPU F16S VM that costs $1.53 per hour, is hamstrung for 3.4 hours testing vulnerabilities on a file I intended on deleting in the first place.
In my Powershell script, I call remove-item -path $myfilename -force. Until MsSenseS.exe releases the file, remove-item simply hangs indefinitely. Ideally, I would be able to simply define exclusions like any other anti-malware/anti-virus solution. As a data professional, I would exclude extensions for MSSQL common files, exclude MSSQL common paths, and let the security solution focus on the unknowns. Unfortunately, I'm not able to do this today with ATP, while at the same time not able to unenroll.
Support pointed us towards this difficult LINK. Given that we have nearly 1000 VMs in IAAS, and especially with consideration for the criticality of these hosts, ATP has presented us with issues numerous times. The only workarounds that we've been given:
- Manually unenroll the hosts as linked above, unknown applicability
- Remove each node from OMS and/or uninstall MMA
- Simply terminate MsSenseS.exe whenever it's causing a problem
Are you doing something differently? Please feel free to reach out @sqlmadhadr with any suggestions.
Comments
Post a Comment