Delivering Project & Product Management as a Service

A keyboard that has an attack key

Dealing with a security event as a mini project

Prolog

I’m usually more inclined to regular routine – One builds an MRD, agree with stakeholders on the road map, do the planning, some HLD, get the message and priorities to the software mines, some LLD, test… and live with the results.

Yet sometimes life gives cashews to those accustomed to peanuts – Several years ago, while leading an Cyber defense project for a large multinational, I got late night text message, to come early next morning to the office, for an urgent meeting. This was somewhat in correlation to me uploading some sniffing software (for learning purposes of course). So I had a sleepless night thinking about how can I stuff all my belonging to a small cardboard case, in such a short notice.

Anyways, morning came and when I entered the meeting blurry eyes and all, it was explained to me to postpone my comfortable routine, and followup on a security incident that has developed overnight.

the ISO 27002, the NIST, the CERT/CC and a few other standards define step-by-step processes for incident handling. So I’ll try to describe them in order with some sugar added.

Detection

As with most large companies, a large part of networked corporate activity comes from M&A or other vertical integration efforts, even with vendors. There is always an attention to assimilate those entities into the security infrastructure of the enterprise or mitigate risks that relate to perimeter movement. In this case the firm operated several franchises, and in one of those semi independent branches, a user experienced unusual activity on a computer. People are used to some automation done by IT but those have regular pattern and it looked as if some script was running in a non transparent way.

This vigilant person called the company’s SOC and some inspection showed that there is a Power-Shell based malware on his computer. Power-Shell is actually a good sysadmin tool that unfortunately can be used, and is used as malware engine. An analyst was awaken at night to review the logs that showed that the malware existed on other computers as well. Next it was verified that the script has already posted a ciphered payload to on the the known bots C&C centers. Hysteria was building and the CEO was notified (I guess his sleep wasn’t good either).

The Triage

Taken from the medical jargon, Triage is severity assessment of the patient, and this one seemed very ill especially since a payload was extracted from the company’s premises and the detection was done by chance only. It was decided that we go into ER routine which brought old memories from armed forces. People act differently when they are told an enemy has breached the camp’s fences. Suddenly you have management attention and business people change their priorities including availability at short notice and accepting instructions that are usually ignored. Nothing like old FUD (Fear, Uncertainty & Doubt) to motivate the crowds.

Analysis & Incident response

The usual process for treatment is linear, you collect the data, examine it using various forensic tools and analyse the information gathered up to the level of evidence you want to report.

However when the patient is bleeding you start with a tourniquet before you do the ex-rays. So we split the effort into three streams:

  1. Long term – A forensic company was hired to assess the vulnerabilities that caused the problem. This effort produced a report that landed on my desk a few weeks later and was used mostly for verification purposes. We also knew that there are ongoing activities that should be done like blocking unorthodox ports (not 80 & 443) outside of the premises, yet we knew that some business processes results in opening ports during the day to day activity, this blocking effort was scheduled as ongoing and long term.
  2. Medium term – We listed all mitigation activities that could be decided in a short notice like password changes and termination of generic accounts yet the change management process will take time since mapping and changing will involve various stakeholders in order to reduce the impact on business process.
  3. Short term – Those activities could be decided and performed immediately in order to reduce risks, such a decision for example was: Blocking computers with the malware on them. Applying a policy for stopping all Power shell usage except for specific computers that require it, and so forth.

Communication planning, and follow-up

All the action items and mitigation tasks were listed in publicly available intranet site in a matrix that categorized the task according to priorities and responsibilities. The short term activities took several days to perform during which we had a daily meeting, yet the real challenge was performing the medium and long term actions, since momentum and motivation fade away quickly.

We held this matrix in public view marking delayed mitigation processes, and notification emails were sent periodically. Some business processes required waiting for product’s upgrades, versions or sometimes the return of people from a vacation so followup was imperative. All and all the incident response took several months to be completed.

Closure & lessons learned

The obvious lesson is to use a public dashboard for all stakeholders to see and dedicate management attention for the life span of the event mitigation plan.

The less obvious lesson is that Black swans and abnormal events will happen. And when they do, we can make the most of them since they are removing people from the mundane routine. The Hawthorne effect can be used to get further productivity and results that are not achievable in organizational day to day life. If you’re abused, “use it to your advantage and not get used to it”.

As a side note. I wonder sometimes, if some IS manager will be brave enough to hire a penetration red team without full company knowledge, just to shake things up his way.