Tuesday, November 15, 2016

Incident Management Planning with a Tabletop Exercise

Within an operational IT ecosystem, issues occur continuously.  Mostly, incidents relate to application defects, resource consumption or the outage of a core infrastructure component such as a server, hypervisor, storage subsystem or network.

There are a whole series of other failures or issues that can occur because of an operational error, a configuration issue, or any number of malicious attacks.

In the course of managing an IT ecosystem, regulated firms need to determine which of the plethora of events are to be regarded, classified and handled as incidents. 

Typically, a regulated company will work closely with its cloud service provider to understand incident management roles and responsibilities in detail.  Before the deployment of production workload, the stakeholders conduct a Table Top Exercise, a subset of techniques derived from planning military operations.

The table top exercise is a meeting to discuss simulated emergency situations. Stakeholders review and consider the necessary actions taken in multiple emergency scenarios, testing their incident response plan in an informal, low-stress environment. Central to the exercise is creating an understanding of the information needed to handle an incident, the sources of information, the decision-making roles & responsibilities and the sequence of hand-offs.

Specifically, a table top exercise goes beyond the need to purely understand an incident.  While not all scenarios will cause business impact or outage to the availability of service, the role of a table top will include how the company makes provisions to ensure continuity of business.  Equally, a table top may determine recommendations for how to ensure an incident does not recur, or specify how to preserve information for subsequent external inspection and audit.

Ahead of time, stakeholders will agree on a set of potential incident scenarios varying in severity.  Each will be simulated to test the possible response with the essential steps captured and documented.


Finally, a Table Top exercise might be useful as part of contractual discussions between a client and cloud service provider in the creation of a Cloud Services Agreement.  The table top output might be a RACI that unambiguously specifies roles and responsibilities.




Reference Examples:

The FDIC has good planning materials for evaluating cyber incidents.
https://www.fdic.gov/regulations/resources/director/technical/cyber/cyber.html

https://www.youtube.com/results?search_query=table+top+incident+managment contains multiple video examples of table top exercises that focus on public emergency situations or business continuity situations.

Sunday, November 6, 2016

Consider These Themes Before Selecting a Cloud Provider for Your Regulated Workloads

Might I suggest you consider the following important points as you select a cloud provider...
  • Are cloud computing offerings the core business of your chosen cloud providers?
  • Is cloud a financially viable business for the cloud provider?
  • Does the cloud provider have a strong technical vision, ability to deliver and proven expertise?
  • How does the cloud service provider maintain a compliant position if using 3rd party staff? What contractual arrangements are in place that enables compliance to be asserted or validated?
  • Are data centers and operational function locations appropriately secured?
  • What are the plans for Continuity of Business and Disaster Recover?  Do major outages impact a client? What capacity remains available in the event of an outage and how can it be reserved and accessible?
  • Track record and availability statistics for service offerings?
  • References from existing clients within regulated industries
  • Unambiguously documented roles and responsibilities (especially for availability, monitoring, incident management, security, and privacy)
  • Reporting capabilities for availability, usage and financial metrics
  • Ability to assure infrastructure, storage, and staffing location 
  • Compliance with published regulatory standards
  • How should consideration of the above change when buying higher value offerings such as PaaS and SaaS?
  • Does the cloud provider understand how to sell and service enterprise clients?
  • Does the cloud provider encourage a one-size fits all approach? - likely this does not work for regulated industries?
  • Can the cloud provider support hybrid on-prem/off-prem deployment models with a supporting ecosystem of connectivity, consistency, and interoperability?
  • Is pricing competitive?  
  • Is the cloud provider profitable and sustainable?

*** Vic Winkler's book, "Securing the Cloud: Computer Security Techniques and Tactics", inspired me in creating this list.


    Thursday, October 27, 2016

    Media Sanitization: Even if you Erase Your Data, is it Really Gone?

    A topic that I've seen multiple times on cloud security/privacy evaluation questionnaires centers on the question of how data is disposed and erased such that it cannot be retrieved.

    It goes without saying that there are many subtleties to this question is answered, and infrastructural characteristics become huge considerations.

    Most people understand that commoditized operating systems do not remove files from physical media in the event of a delete request.  Delete operations merely place a deletion mark in the index entry of a file system directory structure. Having marked a file as deleted makes the physical blocks or sectors of a file available for re-use, but over-write typically only takes place after writing new data or extending an existing file.  Until that time, a deleted file is likely recoverable using simple and readily available utilities.

    Within cloud services, multiple storage options are available.  These range from locally attached HDD, SSD or Flash to SAN or NAS attached storage servers.  For each of these options, the sanitization approach is likely to be different, and, depending on which cloud provider, it may be difficult to get a definitive answer about whether sanitization is even available.

    Many of us have used destructive deletion utilities on our personal devices. These tools use a technique called over-writing in which a file or block is overwritten multiple times with 0s or streams of random characters. However, many people are yet to realize that the overwrite technique is uniquely applicable to magnetic media.

    However, Cloud services are now making use of locally attached Flash or Solid State Drives (Flash in an HDD form factor). To preserve their performance and lifespan, flash controllers seek to distribute writes across the entire available memory capacity of a device.  Overwrite is not possible without severely degrading lifespan by over-writing every single unused sector.  To overcome this feature, flash manufacturers provide reset capabilities that destructively overwrite every device sector simultaneously.  Overwrite is not available at the file level. 

    Therefore, as a cloud user, it is imperative to understand whether you can discover information about the physical infrastructure and piece parts of your server.  IBM Cloud's bare metal servers make this knowledge readily available for information and audit purposes.

    Before releasing a bare metal server back to a cloud provider's inventory, assuming operating system access, a client can choose to execute appropriate data destruction techniques or contract a service provider to do the same.  It is also possible to contract that a cloud provider will physically destroy a server and storage components as needed, especially at end-of-life.

    As one moves further away from physical infrastructure to virtualized network-attached and potentially multi-tenant storage, the question of assured media sanitization becomes much more challenging, if not impossible.

    For this reason, other techniques such as encryption of data at rest become necessary.  We'll discuss these another time.


    For more background reading on this, you might enjoy:

    NIST Guidelines for Media Sanitization (PDF)


    As always, your thoughts and comments will be enjoyed and appreciated.



    Tuesday, October 11, 2016

    Asserting Data Ownership in Regulated Cloud

    When deploying to the cloud, regulated companies need to be able to make unambiguous assertions about their data ownership.

    The primary question about data ownership centers on who can access regulated data, and how to protect data from unintentional access by an unauthorized user.

    It goes without saying that we expect that data placed in a cloud environment will be encrypted.  Both data in motion and at rest is likely to be encrypted, with the encryption/decryption operations performed by the communication, application, middleware or base infrastructure tiers.

    An encryption algorithm needs one or more encryption keys to determine its approach to scrambling or unscrambling stored data or streams of data in motion.  A random stream of bits is used to construct an encryption key.  The longer the stream of bits, the harder it is to crack or break the algorithm.

    As the equivalent of a front-door key which controls access, the encryption key itself has value, just as the data it protects. For compliance purposes, a regulator typically needs confidence that the regulated company owns any key, and does not share them.

    While there are many ways to store a key, protecting one robustly typically falls to a Hardware Security Module (HSM).  An HSM is a tamper resistant security appliance in which security parameters can be stored, exercised and retrieved using standard protocols and APIs.

    https://en.wikipedia.org/wiki/Hardware_security_module

    "A hardware security module (HSM) is a physical computing device that safeguards and manages digital keys for strong authentication and provides cryptoprocessing. These modules traditionally come in the form of a plug-in card or an external device that attaches directly to a computer or network server."

    Cloud service providers recognized the need to provide customer-dedicated HSM devices.  With IBM Cloud's SoftLayer offering, a client can order an HSM from the management portal.  While some cloud providers can provide both logical (virtualized multi-tenant appliance) and physical HSM devices, IBM's approach favors regulated environments by exclusively offering a dedicated hardware option.

    For assurance, a client ordering an HSM, IBM cloud provides a physical single-tenant device delivered in factory state accessible only to the customer's virtual LAN (VLAN).  The client receives a one-time user-id and password and must set the HSM with credentials that are never captured or shared with IBM. IBM has no access to the HSM, except to monitor that it is powered and alive.

    The working principle is that the regulated client will never share HSM access or encryption parameters with the cloud provider. On that basis, a customer subject to regulation,  can confidently assert that they uniquely own both their security parameters and their data.

    For more information, take a look at the link below or drop me a message.

    https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=KUS12364USEN









    Sunday, October 9, 2016

    What does it take to satisfy a regulator?

    I've been working recently with a major financial services firm on its adoption of cloud computing infrastructure and services.  We've been thinking at length what it is going to take to persuade and satisfy regulators and the Chief Risk Officer that moving application workload to the cloud is safe.


    At the heart of regulatory concerns is whether cloud service providers and financial services firms can manage exposures and vulnerabilities.  As cloud computing encourages the use of shared multi-tenant resources, there is a strong focus on ensuring that financial services data is highly protected and that one client of a shared environment cannot negatively impact another through an accidental or intentionally malicious action.

    When thinking about how to create confidence, a simple framework that outlines the core concepts is necessary. I'd like to describe one of the ways we've been thinking about this.
    • Control Entities
    • Monitoring Approach
    • Reporting
    • Incident Management

    Control Entities:

    This relates to which components within a cloud environment may be subject to regulatory oversight.


    So far, the list is likely to include (but not limited to) the following:

    • Hypervisor
    • Physical Server
    • Virtual Server
    • Container
    • Storage System
    • Styles
    • File
    • Object
    • Block
    • Dedicated Storage Server
    • Shared Storage Server
    • Physical HDD (or equivalent) that is locally attached to a server or storage server
    • Software Defined Storage Overlay
    • Archive System
    • Backup System
    • Physical media
    • Network
    • Physical Networks
    • Virtual Networks (VLAN
    • Software defined overlay, eg. VMware NSX
    • VPNs
    • Dedicated connections between Client location and IBM
    • Firewalls
    • Provisioning and Control Systems
    • including maker-checker processes to ensure that unintended cross-pollination between clients cannot occur
    • Security System
    • Users
    • Entitlements/Access
    • Certificate Management
    • Key Management
    • Event Logging System
    • Security Components
    • Firewalls
    • Hardware Security Manager Devices and other Key managers
    • Intrusion/penetration/attack detection
    • Virus and Malware detection
    • Hardware trust system 
    • Data Leakage
    • Physical and Building Security
    • Access controls and logs
    • Video
    • Envrironmentals - power and temperature
    • Monitoring System
    • Components
    • Applications/Processes
    • Availability
    • Performance
    • Reporting System
    • for incident management processes
    • Application
    • Bank Data
    • Nature of data
    • Includes PII or other regulated content
    • Physical geographic placement
    • Geographic/movement/operative restrictions and limits

    Monitoring Approach:


    • parameters to be monitored
    • monitoring interface to be used
    • api, script, log file, etc
    • normal/abnormal thresholds
    • monitoring frequency
    • feed to health dashboard

    Reporting:

    For normal situations, it is will be necessary to present a report that can be viewed by our client and its regulators.
    The report content would need to be determined as would the delivery format (paper or likely a portal), and the delivery frequency (likely quarterly).


    Incident Management:

    For abnormal situations, an immediate reporting mechanism is needed.  
    We would need to determine the following:

    • Notification method:
    • May include mail, phone call, text message, paper document signature delivered.
    • Designated notification contacts
    • Definitely the client, maybe the regulators
    • Notification content
    • What happened?
    • Severity
    • Impact
    • Penetration, data loss, etc
    • Forensic diagnostic data or investigation method
    • Log Files
    • Remediation and Corrective Actions
    • For both the Cloud Service provider or the client.


    In summary...


    The regulatory landscape for cloud computing is very dynamic and both cloud service providers and their clients expect these constructs to evolve.  One of the challenges will be to document these constructs in contractual documents.


    Feel free to add your comments and perspectives.  I'm looking forward to a great discussion.