OSSIM applied to ITIL
Thu, 17 Jan 2008

Recently I stumbled across an interesting article talking about Microsoft, Opensource and ITIL where ossim was being mentioned. (the article can also be found googling for "ossim itil microsoft" in case the link breaks).

I've never been very keen about learning ITIL either (although I've heard about it everywhere during the last year) but this really caught my attention. In that paper ossim gets referenced only on the "security management" section, but I think that's mainly caused by ossim being hard to install, setup and understand when that article was written, so I thought I give it another try from my point of view, taking the included tools into account for the different ITIL sections.

So, the goal of this article would be to extend and improve that other article, giving a thought about how I'd approach all those ITIL recommendations from an OSSIM point of view.

The Information Technology Infrastructure Library is comprised by two main sets and a series of subsets (from what I've read on that article and the wikipedia):

  • Service Support
  • Service Delivery

Note: The definitions after each topic have been quoted from the MS article since they're small and concise.


::start here

The following diagram illustrates a sample support request handled according according to ITIL (thanks Gabi althought there are some typos ;-)):



Service Support

Incident Management

Solving incidents and restoring services quickly.

The incident manager is the obvious choice for this activity from an OSSIM point of view, with a couple of details and exceptions mentioned below.

Five points are mentioned as important for incident management:

  • Detect and Record an incident (the main incident manager)
  • Classify (Incident types)
  • Initial Support (This requires more manual intervention, although automated urls with information could be sent to the users involved in the incident)
  • Investigate and Resolve (With forensics, realtime viewers, vulnerability databases and everything logged on a central location there are plenty of tools for doing this)
  • Track, Monitor and Communicate (Specific metrics / dashboards could be designed for htis, also check the "Report" section at ossim.com.

Problem Management

Solving root cause problems to prevent future incidents.

  • Problem and Error Control (Well, this is what a SIEM is used for 90% of the time, isn't it ? Linking to the root description for overview.)
  • Proactive Management (Identifying problems and errors before they occur. Anomalies can be a very valuable tool for this)
  • Report (Also check the "Report" section under ossim.com

Configuration Management

Maintaining all necessary information about services, service components, and relationships.

At first I was confused and didn't see how ossim could fit into this. The following tasks are mentioned as being important for this part. As I don't see how this fits I don't link them to any specific sites.

  • Planning
  • Identification
  • Configuration Control
  • Status Accounting
  • Verification and Audit
But after this they point out a series of important things that a software would have to accomplish in order to help out on this, namely:
  • Discover devices on a network
  • Determine the host OS
  • Determine OS version and patch levels
  • Determine which applications are installed
  • Detect any changes to the configuration
  • Well, this started to sound familiar. OCS, Nmap network inventory, the whole policy area, p0f, pads, arpwatch all fit perfectly in this section.

    Change Management

    Controlling the implementation of changes in the infrastructure.

    This is an area where ossim can be greatly improved. OCS already includes some "Install software updates and configuration changes" functionality but it's not fully integrated.
    The article suggests Bcfg2, cfgengine or Webmin for this. We already considered using webmin for the installer CD configuration so this would be an obvious addition.

    The rest of it is covered by the Incident manager, reports, executive panels and OCS.

    • Filtering Changes
    • Implementing Changes
    • Review and Close
    • Report

    Release Management

    Controlling the rollout of new releases in the infrastructure.

    Again there are things missing for this to be fully covered by ossim. Integrating Zenoss would be an option although with tight ocs integration and some additional development this should be easily accomplished without additional dependencies. Maybe webmin too.

    Service Delivery

    Service Level Management

    Defininig and implementing clear agreements for service delivery between an IT organization and its customers.

    This on the other hand is something which is fully covered by ossim. Having implemented lots of metrics and measurements it is very easy to:

    Financial Management for IT Services

    Ensuring the proper management, maintenance, and financial operation of IT.

    This is more of a human task than an ossim one. I guess metrics could be enforced if the data related to all of this is stored somewhere but I'd need to investigate it some more.

    Capacity Management

    Optimizing capacity to meet service requirements at an acceptable cost.

    This is a very interesting expertise area, where some parts are covered by ossim and others not so much. The article mentions Zenoss and Hyperic HQ as tools that meet the needs for this, and I guess ossim as is also meets many of the needs:

    • Monitoring (Including trends and forecasting due to heavy RRD usage: Nagios, Ntop, ocs, etc...)
    • Analysis (Most of ossim can be used for this)
    • Demand Management (Most of ossim can be used for this)
    • Modeling (Policy establishes a baseline and anomalies provide information on how this has changed over time)
    • Planning (Most of ossim can be used for this)

    Availability Management

    Ensuring the availability of IT resources to meet agreed upon service levels.

    This obviously is fully covered by monitors and executive panels with metrics:

    IT Service Continuity Management

    Defininig and maintaining appropiate Disaster Recovery plans for IT.

    This is also a very manual and off-ossim task, some parts could help for this (monitoring, sla's, etc...)

    Security Management

    Ensuring the proper access to services as defined by agreements and industry best practices.

    This is were the article mentions ossim, although not fully extending on what can be covered using ossim. Four main tasks are required for this:




    Conclusion

    This article is more of an exercise of what could be done rather than a step by step guide on how to implement it. Obviously that step-by-step guide is now on my todo list but that requires much more than the couple of hours I've spent writing this up.

    Anyway, I hope this article gave a quick overview of how ossim can be applied to ITIL.

    Remember the graph at the top and the numbers ? Let's resume where each task would fit in ossim.

    1. The new incident creates a log read by the agent, a response is issued by policy and a new incident inserted.
    2. Using the incident manager we tag and typify the incident.
    3. Next we analyze events, alarms, monitors and reports checking what could be wrong. We notice the service level and metrics at executive panel decreasing.
    4. Through the incident manager we subscribe the people who can fix the issue to the incident, and describe needed actions in order to fix them.
    5. Another executive panel reflects to the customer what the current issue means to their business in terms of money.
    6. Again we track everything using the incident manager and issue a patch rollout using OCS's automatic soft installation feature.
    7. We change our inventory information based on the new automatic changes and keep track of them using the incident manager. The anomalies panel gets automatically updated once the new versions get detected.
    8. Nagios will reflect the downtime and affect the overall service level due to this issue.
    9. Policy, executive panels and directives get updated if needed after this new situation.
    10. The closed incident gets reported to the customer and will show up on incident reports.
    And in order to finish I'd like to quote a friends comment about ITIL: ITIL is a series of things everybody with a little bit of common sense would do in an enterprise if he had the time, the people or the money to do it. So if it's not being done I yet it will take more than a series of best practices to change that ;-)

    posted at: 17:33 | path: /ossim | permanent link to this entry | 1 comments |
    Tags: ,



    * Posted by Gabriel Díaz at Tue Jan 22 10:33:42 2008
    Hello

    About the commment I would say it is easy to get lost with the details and forgot about the big picture of the company organisation.

    ITIL helps in defining a big picture that has been tested by other organizations.Is well known it works. When you have it implemented on your organisation, you could spend more time defining details of the services than making your company components work together.


    Or at least, that's what they sell. . . i never see it myself :)

    Name:


    E-mail:


    URL:


    Comment:


    Categories

    / (36)
        code/ (1)
        feed/ (1)
        ossim/ (24)
            installer/ (3)
            plugins/ (2)
            tuning/ (3)
            tutorials/ (7)
        personal/ (9)
            campus/ (2)
            opinion/ (1)
            travel/ (1)
        rants/ (1)




    RSS




    < January 2008 >
    MoTuWeThFrSaSu
      1 2 3 4 5 6
    7 8 910111213
    14151617181920
    21222324252627
    28293031   




    Archives

    2008-Oct
    2008-Aug
    2008-Jul
    2008-May
    2008-Mar
    2008-Feb
    2008-Jan
    2007-Dec
    2007-Nov




    Tags




    Made with PyBlosxom