Tutorial 2: Syslog data mining with attached md5sum. AKA "Store 100% of data".

December 6, 2007  |  Dominique Karg

1. The need. The Hype.

There’s obviously a need for storing vast amount of logs, and few things today aren’t able to log into syslog. So it’s just obvious to stumble upon that request every once in a while, and this tutorial illustrates the OSSIM approach at massive syslog data storage. Of course, where you say syslog you can say windows event log, snmp data, whatever generates a big amount of raw data.

Compliance

I don’t know much yet about all of this compliance stuff (I were lucky, Julio always has been much more knowledgeable on that area than me so I could skip it) but I guess I’ll have to start learning, there are just too many people asking for it and I’m getting very curious.

From what I’ve seen, a short list of regulations requiring, or at least strongly recommending a certain amount of raw data storage and reports are:

  • ISO27001/17799
  • SOX
  • HIPAA
  • PCI
  • Basel II
  • NIST 800-53
  • Many more…

(Searching for SIM and compliance information I see that’s a major marketing point from vendors too, well, just for the records, ossim helps you to be compliant with all that stuff)

Centralized logging

Maybe the need is pure sysadmin’s lazyness. You want to be able to answer to questions you get asked by your management / customers in the easiest possible way.

I heard this from a guy a couple of days ago: the more information about your network you’ve got, the more answers you can give, and that’s exactly what SIM/SEM systems are good at.

Data mining

This is a bit redundant with the previous entry, but there are people that just don’t care about exact data, but they’re in desperate need of colorful graphs in order to be able to keep their bosses calm. Well, having logs from everything in your network allows for easy colorful report generation with little knowledge of the underlying data. The worthyness of those reports in the end will be highly questionable of course.

2. The preparation.

I thought it would be interesting to explain the process we used to create this plugin in the first time here, so it will be a hybrid tutorial: syslog and how to fetch your own datasources.

Write a new function

Well, the first thing that came to my mind was that, even without knowing much about those compliance regulations, if they’ve been designed with at least a bit of common sense they’ll require a digital cryptographic stamp on the logs. So let’s just add a little function to our plugin environment that does just that: timestamping using md5.

def md5sum(datastring):

    return md5.new(datastring).hexdigest();

Just put your functions into ParserUtils.py and they’re available for all plugins. That file usually resides at /usr/share/ossim-agent/ossim_agent/.

Write a new plugin

The next thing we’ll do is create a new plugin. We want this plugin to:

  • Get every single syslog line
  • Respect the orignal logline
  • Checksum that line

After a bit of toying I’ve seen it would be easy to also:

  • Extract the sensor from the line
  • Extract the source ip
  • Extract the originating process
  • Extract the PID
  • Extract the line without the “changing” part

All of this while respecting the original line of course.

;; syslog

;; plugin_id: 4007

;; type: detector

;;



[DEFAULT]

plugin_id=4007



[config]

type=detector

enable=yes



source=log

# Enable syslog to log everything to one file. Add it to log rotation also.

# echo "*.*     /var/log/all.log" >> /etc/syslog.conf; killall -HUP syslogd

location=/var/log/all.log



# create log file if it does not exists,

# otherwise stop processing this plugin

create_file=true



process=

start=no

stop=no

startup=

shutdown=



## rules



[syslog - datamining]

# Sep  6 12:07:26 ossim-devel su[9886]: FAILED su for root by juanma

event_type=event

regexp="^(?P(S+s+d+s+dd:dd:dd)s+(?P[^s]+)s+(?P

 

[^s]+)s+(?P[^[]*)[(?P d+)]:(?P.*))$" sensor={resolv($sensor)} date={normalize_date($1)} plugin_sid=1 sensor={resolv($sensor)} userdata1={md5sum($logline)} userdata2={$logline} userdata3={$generator} userdata4={$logged_event} userdata5={$pid}

 

This is quite a simple plugin. Only one rule, a relatively simple regexp, you may notice the md5sum function being used, as well as a couple of others.

 

We don’t really need to extract the fields here but that will make getting reporting done for this plugin a very very easy task.

 

Write the SQL

 

Next thing we have to do is make the server aware of our new plugin. We’ll assign it ID 4007 since it’s free and it’s in the range reserved for syslog, and insert only one event.

 

Please notice this event is being inserted with priority 0, more on this later.

 

 

 

DELETE FROM plugin WHERE id = "4007";

DELETE FROM plugin_sid where plugin_id = "4007";

INSERT INTO plugin (id, type, name, description)

VALUES (4007, 1, 'syslog', 'Syslog plugin with md5 checksum logging');

INSERT INTO plugin_sid (plugin_id, sid, category_id, class_id, name, priority, reliability)

VALUES (4007, 1, NULL, NULL, 'Syslog: syslog entry' , 0, 1);

 

 

 

Priority 0

 

Inserting the event with priority 0 is a little trick that helps us preventing these items from generating noise to our SIM system.

 

By default, each incoming event as an instant risk value calculated as:

 

risk = asset * priority * reliability / 25

 

So, what would happen if this plugin is activated and you enable debugging on an app that generates a thousand log lines ? Well, that host would be treated as compromised, alarms would raise, and many indicators would be affected by this false positive.

 

Since we actually only want to store events, not assess any instant risk with them, one of three variables on the multiplication should be set to 0 for this type of events to be ignored by the risk assessment system by default (they would still be correlated, forwarded, could become alarms, etc. etc…). We can’t control the assets, and the event’s reliability can only be modified by correlation rules. Wouldn’t it be nice to be able to decide which events to take into account or not based on our policy ? well, the obvious choice is priority, since correlation rules can affect both priority as well as reliability and policies only affect priorities.

 

That means if we want specific source ips or generators to behave differently, we can control it via policy, and if we want to generate alarms, correlate with other events or based on some of the extracted fields, we do it using directives (correlation rules).

 

Enable syslog logging

 

There are better ways of getting all the logs into one file without duplicating but just for the records sake, I added “*.* /var/log/all.log” to /etc/syslog.conf and restarted the service. That way we can point our plugin at that file and forget about filtering.

 

Enable remote logging

 

Remote logging is similar, but entering “*.* @agent_ip” instead into the syslog file and restarting. Here is my output on MacosX:

 

Gestalt:etc dk$ sudo vi syslog.conf

Gestalt:etc dk$ ps ax | grep syslog

   13   ??  Ss     0:04.84 /usr/sbin/syslogd

 3994 s004  R+     0:00.00 grep syslog

Gestalt:etc dk$ sudo kill -HUP 13

Gestalt:etc dk$ syslog -s test



On our tail -f /var/log/all.log:


Dec  6 07:10:12 10.0.1.3 Gestalt syslog[3997]: test

 

 

 

3. The implementation.

 

Update 2008/03/11: Fixed the regexp, as can be seen in comments below it was too narrow. Just re-download the .cfg.txt

 

Copy plugin to it’s place

 

The plugin file should be put into /etc/ossim/agent/plugins/syslog.cfg.

 

Insert SQL

 

The SQL can be inserted from the server as:

 

cat syslog.sql.txt | mysql -p -uroot ossim

 

Note to the installer users: you can get your database password from /etc/ossim/ossim_setup.conf as root, grepping for “pass” since it gets generated at install randomly for each installation. That password is also used for ntop, nessus, etc…

 

Restart server

 

There are nicer ways but this works:

 

killall ossim-server; ossim-server -d

 

Enable plugin

 

Add a line like this to your /etc/ossim/agent/config.cfg, into the [plugins] section.

 

syslog=/etc/ossim/agent/plugins/syslog.cfg

 

Restart agent

 

killall ossim-agent; ossim-agent -d

 

Does it work ? RT event viewer.

 

The best way to see if an event traverses through generator—> agent—> server—> database is to fire up the RealTime event viewer and start it selecting “fast”.

 

Here you can see they’re starting to arrive:

 

(Image removed, broken link, I’m very sorry. DK.)

 

4. The results.

 

That’s it, the data is being fed into the database. What now?

 

Policy

 

First of all, remember that the event’s won’t affect our risk but they’re still getting stored.

 

WARNING: If you abuse this, your database won’t be able to handle the load from all that data. You’re going to store it, but with little use. Default OSSIM is not tuned for more than 1, 1.5 Million active events in database. That’s more than enough for a small/home user, but way too much if you feed it with 2000 syslogging devices.

 

So what could you do ? store only part of the data using policies when you need it, correlating all events for more interesting stuff. This obviously doesn’t help if you came here looking for better compliance. (See below, sections 5 and 6).

 

Following are a couple of hints about how to create a policy that only stores events from a single syslog host.

 

  1. Create a host: (Image removed, broken link, I’m very sorry. DK.)
  2. Create a plugin group with syslog plugins: (Image removed, broken link, I’m very sorry. DK.)
  3. Select the generating host as source (you could also apply this to a sensor, and any hosts it receives) (Image removed, broken link, I’m very sorry. DK.)
  4. Disable everything, just store (or leave correlation enabled for these events if you wish) (Image removed, broken link, I’m very sorry. DK.)
  5. Create a drop-everything else policy: (Image removed, broken link, I’m very sorry. DK.)
  6. The result: (Image removed, broken link, I’m very sorry. DK.)

 

Performance

 

I’ll just throw in the warning in again:

 

WARNING: If you abuse this, your database won’t be able to handle the load from all that data. You’re going to store it, but with little use. Default OSSIM is not tuned for more than 1, 1.5 Million active events in database. That’s more than enough for a small/home user, but way too much if you feed it with 2000 syslogging devices.

 

Which basically means: if you’ve got few devices (at most 200000 events a day) you can go for “logging everything” with the default installer ossim. Otherwise you’ll have to heavily tune your database, enable “writing into filesystem” and/or play some other DB tricks. See section 6 below.

 

Event viewer

 

Since we’ve already separated our info into bits, besides of keeping the whole original line, we can easily create a new event viewer panel for this specific plugin.

 

In this example below, I created one with four columns, and label:data relations such as:

 

  • DATE:DATE
  • Checksum:USERDATA1
  • Generator:USERDATA3
  • Log:USERDATA4

 

(Image removed, broken link, I’m very sorry. DK.)

 

Note: log is not the entire line, that would be USERDATA2 as you can see on our plugin.

 

Reports

 

What else do we need to be able to tell our boss that we’re now much closer to being compliant ? nifty graphs :blush:

 

Let’s define a new tab (using a nice icon from images.google.com) and put two panels in there, a simple graph with our top 5 generated logs and a cloud with all the generators.

 

Tab: (Image removed, broken link, I’m very sorry. DK.)

 

Cloud definition: (Image removed, broken link, I’m very sorry. DK.)

 

Cloud SQL: (Image removed, broken link, I’m very sorry. DK.)

 

Resulting graph: (Image removed, broken link, I’m very sorry. DK.)

 

Note: The pie graph is of type “SQL”, using rows as labels, 45 degree rotation. Here is the export you can import into your panel:

 

 

 

plugin_custom_sql::

YTo0OntzOjY6InBsdWdpbiI7czoyMjoicGx

1Z2luX2NvbmZpZ19leGNoYW5nZSI7czoxMT

oicGx1Z2luX29wdHMiO2E6Mjc6e3M6ODoiZ

3JhcGhfZGIiO3M6NToic25vcnQiO3M6OToi

Z3JhcGhfc3FsIjtzOjIxOToic2VsZWN0IHV

zZXJkYXRhNCwgY291bnQoKikgYXMgbnVtIG

Zyb20gZXh0cmFfZGF0YSwgb3NzaW1fZXZlb

nQgd2hlcmUgb3NzaW1fZXZlbnQuc2lkID0g

ZXh0cmFfZGF0YS5zaWQgYW5kIG9zc2ltX2V

2ZW50LmNpZCA9IGV4dHJhX2RhdGEuY2lkIG

FuZCBvc3NpbV9ldmVudC5wbHVnaW5faWQgP

SA0MDA3IGdyb3VwIGJ5IHVzZXJkYXRhNCBv

cmRlciBieSBudW0gZGVzYyBsaW1pdCA1Owo

KIjtzOjExOiJncmFwaF90aXRsZSI7czoyMj

oiVG9wIDUgc3lzbG9nIHByb2Nlc3NlcyI7c

zoxMDoiZ3JhcGhfdHlwZSI7czozOiJwaWUi

O3M6MTg6ImdyYXBoX2xlZ2VuZF9maWVsZCI

7czozOiJyb3ciO3M6MTY6ImdyYXBoX3Bsb3

RzaGFkb3ciO3M6MToiMCI7czoxNToiZ3Jhc

GhfcGllX3RoZW1lIjtzOjU6IndhdGVyIjtz

OjE3OiJncmFwaF9waWVfM2RhbmdsZSI7czo

yOiI0NSI7czoxNzoiZ3JhcGhfcGllX2V4cG

xvZGUiO3M6NDoibm9uZSI7czoyMToiZ3Jhc

GhfcGllX2V4cGxvZGVfcG9zIjtzOjE6IjEi

O3M6MjI6ImdyYXBoX3BpZV9hbnRpYWxpYXN

pbmciO3M6MToiMSI7czoxNjoiZ3JhcGhfcG

llX2NlbnRlciI7czozOiIwLjIiO3M6MTg6I

mdyYXBoX3BvaW50X2xlZ2VuZCI7czowOiIi

O3M6MTc6ImdyYXBoX3Nob3dfdmFsdWVzIjt

zOjE6IjEiO3M6MTE6ImdyYXBoX2NvbG9yIj

tzOjc6IiMwMDAwODAiO3M6MTQ6ImdyYXBoX

2dyYWRpZW50IjtzOjE6IjAiO3M6MTA6Imdy

YXBoX2xpbmsiO3M6MDoiIjtzOjE2OiJncmF

waF9yYWRhcl9maWxsIjtzOjE6IjEiO3M6MT

E6ImdyYXBoX3lfbWluIjtzOjE6IjAiO3M6M

TE6ImdyYXBoX3lfbWF4IjtzOjE6IjAiO3M6

MTE6ImdyYXBoX3hfbWluIjtzOjE6IjAiO3M

6MTE6ImdyYXBoX3hfbWF4IjtzOjE6IjAiO3

M6MTE6ImdyYXBoX3lfdG9wIjtzOjE6IjAiO

3M6MTE6ImdyYXBoX3lfYm90IjtzOjE6IjAi

O3M6MTE6ImdyYXBoX3hfdG9wIjtzOjE6IjA

iO3M6MTE6ImdyYXBoX3hfYm90IjtzOjE6Ij

AiO3M6MTU6ImV4cG9ydGVkX3BsdWdpbiI7c

zoxNzoicGx1Z2luX2N1c3RvbV9zcWwiO31z

OjExOiJ3aW5kb3dfb3B0cyI7YTozOntzOjI

6ImlkIjtzOjM6IjF4MSI7czo1OiJ0aXRsZS

I7czoxMDoiVG9wIDUgbG9ncyI7czo0OiJoZ

WxwIjtzOjA6IiI7fXM6MTE6Im1ldHJpY19v

cHRzIjthOjQ6e3M6MTQ6ImVuYWJsZV9tZXR

yaWNzIjtzOjE6IjAiO3M6MTA6Im1ldHJpY1

9zcWwiO3M6OToic2VsZWN0IDQ7IjtzOjEzO

iJsb3dfdGhyZXNob2xkIjtzOjE6IjMiO3M6

MTQ6ImhpZ2hfdGhyZXNob2xkIjtzOjE6IjU

iO319

 

5. The tuning/solutions.

 

After the two warnings, there are some solutions for the problems.

 

Database rotation

 

First we implement database rotation every backup_day days. By default this is set to 5, so if you are collecting a maximum of 200.000-300.000 syslog events a day, you should be fine. If that number is much more, you can increase the days, if it’s higher, decrease. That’s the quick and dirty solution.

 

File storage

 

A friend of mine is working on filesystem storage in the form of /date/sensor/plugin/ip.log, hooks are there for that format and it shoud be available very soon.

 

6. The spam.

 

I really hate spamming on a community article, please don’t read any further if you’re satisfied with what you’ve got so far and don’t have any special requirements.

 

Compliance module: Performance, storage, reporting.

 

The founders of ossim have created a company about a year ago, called (in order to be original) ossim. The site is ossim.com and there we provide a series of added value solutions and services.

 

Since all of this compliance stuff is really a high level enterprise need we’ve spent a lot of time packaging a few custom modules just for that matter, and are continously improving them in parallel with normal ossim evolution.

 

So the compliance modules are being provided as sort of an add-on to the open ossim distribution, since we feel that that’s a part useless for the average community user and we would like the companies that are saving many hundres of thousands of dollars by using our opensource solution to put something back into development.

 

Professional compliance modules include:

 

  • Appliance version for easy deployment
    • Heavily tuned database, using a combination of Heap, Merge, InnoDB, MyISAM and compressed MyISAM tables for optimal performance / storage capacity.
    • 64Bit version
    • MMAP compiled libpcap applications for greatly enhanced performance (Up to Gigabyte speeds with a standard dual core machine)
    • Sets of custom flash graphs for specific compliance needs
    • Sets of custom viewers for specific compliance needs
  • Guides for particular compliance needs. “What exactly do I have to setup in ossim to be more compliant with XXXX”
  • Specific support
  • Specific compliance rule feeds

Share this with others

Get price Free trial