Tutorial 6: Plugin writing primer
Tue, 11 Mar 2008

A couple of days ago I was fixing the fortinet/fortigate with the kind help of a Swiss OSSIM user (thanks Mikael ;-) ) and I wrote this little piece of python in order to help me out with it. Now I'm using it a lot to debug plugins so I guess more people could benefit from this also :-)
And well, I'll paste a sample plugin debugging session in order to give ideas.
BTW, this assumes basic knowledge of regular expressions, check this Regexp Primer out if you want to refresh that knowledge. And BTW2, some log lines are broken for readability.


::start here

#!/usr/bin/python 
import sys,re

if sys.argv[3] is None:
        print "Args are filename, regexp and [0|1]"

f = open(sys.argv[1], 'r')
data = f.readlines()
exp=sys.argv[2]

print sys.argv[2]

line_match = 0

matched = 0

for line in data:
	result = re.findall(exp,line)
	try:
		tmp = result[0]
	except IndexError:
		if sys.argv[3] is "1":
			print "Not matched:", line
		continue
	print result
	matched += 1


print "Counted", len(data), "lines."
print "Matched", matched, "lines."

Basically it will take a logfile as input, a regexp and wether to verbosely show the matched lines or not. That way you can start working towards more complex regexps and test it against a full logfile in realtime.

regexp.py logfile "regexp" (0|1) # 0 == do not show "non-matching lines"

Let's go through a simple file containing logs (got it from http://www.ossec.net/wiki/index.php/PostgreSQL_Samples).

[2007-08-31 19:22:21.469 ADT] :[unknown] LOG:  connection received: host=192.168.2.99 port=52136
[2007-08-31 19:22:21.485 ADT] 192.168.2.99:ossecdb LOG:  connection authorized: user=ossec_user 
database=ossecdb
[2007-08-31 19:22:22.427 ADT] 192.168.2.99:ossecdb LOG:  disconnection: session time: 0:00:00.95 
user=ossec_user database=ossecdb host=192.168.2.99 port=52136
[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR:  relation "lala" does not exist
[2007-09-27 11:02:46.444 ADT] 192.168.2.10:ossecdb LOG:  disconnection: session time: 0:00:35.79 
user=ossec_user database=ossecdb host=192.168.2.10 port=3584

Log messages:
[2007-09-01 07:14:41.062 ADT] : LOG:  autovacuum: processing database "template1"
[2007-09-01 07:15:41.079 ADT] : LOG:  autovacuum: processing database "ossecdb"

Query log:
[2007-09-01 16:44:49.244 ADT] 192.168.2.10:ossecdb LOG:  duration: 4.550 ms  statement: 
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'
[2007-09-01 16:44:49.251 ADT] 192.168.2.10:ossecdb LOG:  duration: 5.252 ms  statement: 
INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')
[2007-09-01 16:44:49.252 ADT] 192.168.2.10:ossecdb LOG:  duration: 0.016 ms  statement: 
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'

[2007-09-27 11:02:51.611 ADT] 192.168.2.10:ossecdb LOG:  statement: INSERT INTO 
alert(id,server_id,rule_id,timestamp,location_id,src_ip) 
VALUES ('3577', '1', '50503','1190916566', '140', '0')

Query error:
[2007-08-31 19:17:42.128 ADT] 192.168.2.99:test ERROR:  relation "alertaaa" does not exist
[2007-08-31 19:17:46.375 ADT] 192.168.2.99:test ERROR:  syntax error at or near "a" at character 1
[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR:  relation "lala" does not exist

Authentication error:
[2007-09-01 19:08:49.862 ADT] : LOG:  connection received: host=192.168.2.99 port=37142
[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"

The actual regexp debugging process

1. Start out simple

Gestalt:tmp dk$ regexp.py postgresql.log.txt "(^.*$)" 0 | tail -n 2
['[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"', '', '']
Counted 25 lines.
Matched 25 lines.
Gestalt:tmp dk$ 

The tail after the command is there because by default it shows all the matching lines, and one is enough for our sample.
The used regexp is (^.*$) which as you know matches everything from the beginning to the end.

2. Basic separation

Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$" 0
^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$
[('2007-08-31 19:22:21', ':[unknown] LOG:  connection received: host=192.168.2.99 port=52136')]
[('2007-08-31 19:22:21', '192.168.2.99:ossecdb LOG:  connection authorized: 
user=ossec_user database=ossecdb')]
[('2007-08-31 19:22:22', '192.168.2.99:ossecdb LOG:  disconnection: session time: 0:00:00.95 
user=ossec_user database=ossecdb host=192.168.2.99 port=52136')]
[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR:  relation "lala" does not exist')]
[('2007-09-27 11:02:46', '192.168.2.10:ossecdb LOG:  disconnection: session time: 0:00:35.79 
user=ossec_user database=ossecdb host=192.168.2.10 port=3584')]
[('2007-09-01 07:14:41', ': LOG:  autovacuum: processing database "template1"')]
[('2007-09-01 07:15:41', ': LOG:  autovacuum: processing database "ossecdb"')]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 4.550 ms  statement: 
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 5.252 ms  statement: 
INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')")]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 0.016 ms  statement: 
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]
[('2007-09-27 11:02:51', "192.168.2.10:ossecdb LOG:  statement: 
INSERT INTO alert(id,server_id,rule_id,timestamp,location_id,src_ip) 
VALUES ('3577', '1', '50503','1190916566', '140', '0')")]
[('2007-08-31 19:17:42', '192.168.2.99:test ERROR:  relation "alertaaa" does not exist')]
[('2007-08-31 19:17:46', '192.168.2.99:test ERROR:  syntax error at or near "a" at character 1')]
[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR:  relation "lala" does not exist')]
[('2007-09-01 19:08:49', ': LOG:  connection received: host=192.168.2.99 port=37142')]
[('2007-09-01 19:08:49', '192.168.2.99: FATAL:  password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.
Gestalt:tmp dk$ 

Here we make a first attempt at separating the original datetime from the rest of the line. Additionally we've filtered out the junk lines that might appear in the file, so now we only match 16 lines (the actual 16 lines that contain valid log files). Out regexp starts looking uglier now, but still understandable ;-): ^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$

3. Extracting the database host and actual DB name

Gestalt:tmp dk$ regexp.py postgresql.log.txt 
"^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(.*)$" 0 | tail -n 3
[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL:  password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.

This time I'm facing the first problems. In order to get the output shown above I had to go through a bit of try and error.
Starting with a first try I see there are 5 lines not matching, since we know it's 16 we need to match:

Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+([^:]+):(\S+)\s+(.*)$" 0 | tail -n 2
Counted 25 lines.
Matched 11 lines.
Gestalt:tmp dk$ 
Let's see which ones are not matching (notice the change from 0 to 1 at the end of the python's ARGV):
Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+([^:]+):(\S+)\s+(.*)$" 1 | grep "Not matched: \["
Not matched: [2007-08-31 19:22:21.469 ADT] :[unknown] LOG:  connection received: host=192.168.2.99 port=52136
Not matched: [2007-09-01 07:14:41.062 ADT] : LOG:  autovacuum: processing database "template1"
Not matched: [2007-09-01 07:15:41.079 ADT] : LOG:  autovacuum: processing database "ossecdb"
Not matched: [2007-09-01 19:08:49.862 ADT] : LOG:  connection received: host=192.168.2.99 port=37142
Not matched: [2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"
Gestalt:tmp dk$ 
Ok, understood, either DB host or DB name are optional, so let's add that to our regexp, getting a final regexp of: ^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(.*)$
On the matching log line we can also see how the db host and name are being correctly extracted ;-).

4. Got a plugin_sid value

Gestalt:tmp dk$ regexp.py postgresql.log.txt 
"^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(?P<sid>[^:]+):\s+(?P<log_msg>.*)$" 0 | tail -n 3
[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL', 'password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.
Gestalt:tmp dk$ 

Looking at the remainder string it starts getting obvious that the next field is sort of a priority value. In this first phase the postfix plugin will remain somewhat crippled since I'm going to use that field as plugin sid, instead of identifying each of the log types by what they actually do. (That's left as a task for the reader).
Our resulting regexp, extracting the priority and removing some white space in front of the logline would be: ^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(?P<sid>[^:]+):\s+(?P<log_msg>.*)$

Final Roundup

The plugin is far from being finished, but the goal of this tutorial was to demistify the regexp part a bit, since it's actually only based on a bit of research about the log (possible values for fields), a bit of intuition and lots of try && error.
I hope that goal has been met.

To be continued... (finishing the regexp, writing the .cfg file and writing the .sql file)

posted at: 11:38 | path: /ossim/tutorials | permanent link to this entry | 2 comments |
Tags: , ,



* Posted by Gabriel at Tue Mar 11 12:04:29 2008
hello

For curious readers i would suggest reading the links on this page:

http://swtch.com/~rsc/regexp/

There are links to various regular expression implementations in C, comenting about performance, functionality and theory.
* Posted by Calvin Mills at Wed Mar 26 18:40:52 2008
I have a plugin that has been working with only one regex and sid test.  But after reading this I figured I would add another test to the plugin, and when I tested the regex against the log it works fine, but when I insert it into my plugin, it never gets caught??? 

Everything is still caught by the first regex sid 1.  Is there a particular order that they need to be in?  I have added them to database, and I have restarted many times.

I am testing the line
Mar 26 11:20:33 192.168.0.13 home kernel: quickcam: frame lost

with the regex
(?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*? quickcam.*))$

and my full plugin config is

;; syslog
;; plugin_id: 4007
;; type: detector
;;

[DEFAULT]
plugin_id=4007

[config]
type=detector
enable=yes

source=log
# Enable syslog to log everything to one file. Add it to log rotation also.
# echo "*.*  /var/log/all.log" >> /etc/syslog.conf; killall -HUP syslogd
location=/var/log/all.log

# create log file if it does not exists,
# otherwise stop processing this plugin
create_file=true

process=syslogd  ; -r || -u
start=no  ; launch plugin process when agent starts
stop=no  ; shutdown plugin process when agent stops
startup=/etc/init.d/%(process)s start
shutdown=/etc/init.d/%(process)s stop

## rules
[syslog - datamining - fallthrough]
#
event_type=event
regexp="^(?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*))$"
sensor={resolv($sensor)}
date={normalize_date($1)}
plugin_sid=1
#userdata1={md5sum($logline)}
userdata1={$source}
userdata2={$generator}
userdata3={$logged_event}

[syslog - datamining - quickcam]
#
event_type=event
regexp="^(?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*? quickcam.*))$"
sensor={resolv($sensor)}
date={normalize_date($1)}
plugin_sid=3
#userdata1={md5sum($logline)}
userdata1={$source}
userdata2={$generator}
userdata3={$logged_event}

Name:


E-mail:


URL:


Comment:


Categories

/ (37)
    code/ (1)
    feed/ (1)
    ossim/ (24)
        installer/ (3)
        plugins/ (2)
        tuning/ (3)
        tutorials/ (7)
    personal/ (10)
        campus/ (2)
        opinion/ (1)
        travel/ (1)
    rants/ (1)




RSS




< March 2008 >
MoTuWeThFrSaSu
      1 2
3 4 5 6 7 8 9
10111213141516
17181920212223
24252627282930
31      




Archives

2008-Dec
2008-Oct
2008-Aug
2008-Jul
2008-May
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov




Tags




Made with PyBlosxom