Can Artificial Intelligence (AI) Automate Information Security?
As computer professionals a few decades ago, many of us were using dumb terminals to get our jobs done. Security was not on the top of our minds because the ‘Mainframe God’ protected us from the bad things happening. As we all know, those good old days are gone. The advent of social media, broadband wireless communication, affordable mobile computing devices, sensors and the internet of things has brought in a new computing paradigm. We are in the cloud, using precision medicine, precision agriculture and smart machines, and are computing at the edge. We have gone from centralized and controlled computing environments to decentralized, uncontrolled and distributed computing environments. Data is everywhere. The challenge is whether we can guarantee confidentiality, integrity and availability (CIA) of the data. Can automation, machine learning and artificial intelligence (AI) help in any way?
According to an International Data Corporation report (2017) the world datasphere is predicted to grow from 16 Zettabytes (ZB) in 2016 to 163 ZB in 2025 (1 ZB = 1 billion Terabytes). 87 percent of that data will be sensitive but only about half of it will have the necessary protection. If security is not enhanced, the estimated cost could be about six trillion dollars in 2025. A McAfee and Center for Strategic and International Studies report (2018) pegged the cybercrime cost at 600 billion US dollars for the year 2017. Cybercrime is ranked as the third greatest global economic scourge after government corruption and narcotics. Therefore, it needs to be prevented and its impact reduced and eliminated if possible.
More and more businesses and governments (federal, state, local and tribal) are undergoing digital transformations to reinvent themselves. This will increase efficiency, reduce costs, and provide rich customer interactions. This will also increase customer service and agility, and decrease operational costs. The rapid adoption and expansion of digital services do not provide enough time for organizations to embed robust security measures before rollout. Cyberthreats can occur at the level of infrastructure, data, application and people. NIST, SANS Institute, EU-ANIS and Ponemon Institute provide information and guidelines on security controls (physical, management and technical controls) to prevent organizations from cyberthreats.
A McAfee and Center for Strategic and International Studies report (2018) pegged the cybercrime cost at 600 billion US dollars for the year 2017
The pace of digital transformation and technology advancement has not provided enough time for cybersecurity and cyber professionals to keep up with emerging threats. Nations are struggling to create legislation to prevent and deal with cybercrimes. Therefore, governments and businesses need to adapt and leverage the meager resources they have to identify, design, test and implement cybercrime prevention and management tools and technologies. To overcome the shortage of trained security personnel, businesses and governments can leverage new technologies like big data, machine learning, deep learning and artificial intelligence to enhance, supplement or in some cases replace security personnel. According to a Ponemon Institute survey (2017), 37 percent of the respondents reported over 10,000 alerts per day and 52 percent were false positives. Almost 400 new threats occur every minute and 70 percent of them go undetected according to a Bank of America report.
As problems in cyber security are multi-dimensional with thousands of features (variables), the legacy way of analyzing threat data using just signal to noise ratio is not going to work very well. They generate a number of false positives or miss real signals (false negatives). Therefore, it is necessary to detect threat signals by mining vast amounts of machine logs. As the signal to noise ratio is very low, detecting attack sequences among isolated and infrequent events is a major challenge. Deep Learning, Machine Learning and AI algorithms and models can help but they require labelled data to train. The majority of security data has no labels making it difficult to apply deep learning networks to a large number of cyber security events. However, the security industry is addressing this issue by generating class labels for at least a few use cases. By using these class labels, some machine learning algorithms can now detect malware by ranking malicious websites and DNS domains.
Another very reliable use case of data science for security is making a baseline of each user or network device or entity within the network and comparing it with the real-time data to find rare or abnormal behavior. Sometimes, this does result in false positives. Therefore, we need to integrate multiple technologies including AI with human security experts to improve protection, prevention, detection and recovery. Automation (for data filtering) and AI models can act as a data sieve and will be very helpful in detecting non-obvious patterns in mountains of machine data. This will help the security experts to focus on a manageable amount of information to analyze and make a decision on the incident or threat data. I have not come across a reliable security autopilot (AI) yet.