Tutorial 6: Plugin writing primer

March 11, 2008  |  Dominique Karg

A couple of days ago I was fixing the fortinet/fortigate with the kind help of a Swiss OSSIM user (thanks Mikael :wink: ) and I wrote this little piece of python in order to help me out with it. Now I’m using it a lot to debug plugins so I guess more people could benefit from this also :blush:
And well, I’ll paste a sample plugin debugging session in order to give ideas.

BTW, this assumes basic knowledge of regular expressions. And BTW2, some log lines are broken for readability.


#!/usr/bin/python 

import sys,re



if sys.argv[3] is None:

        print "Args are filename, regexp and [0|1]"



f = open(sys.argv[1], 'r')

data = f.readlines()

exp=sys.argv[2]



print sys.argv[2]



line_match = 0



matched = 0



for line in data:

	result = re.findall(exp,line)

	try:

		tmp = result[0]

	except IndexError:

		if sys.argv[3] is "1":

			print "Not matched:", line

		continue

	print result

	matched += 1





print "Counted", len(data), "lines."

print "Matched", matched, "lines."

 

 

Basically it will take a logfile as input, a regexp and wether to verbosely show the matched lines or not. That way you can start working towards more complex regexps and test it against a full logfile in realtime.

regexp.py logfile “regexp” (0|1) # 0 == do not show “non-matching lines”

Let’s go through a simple file containing logs (got it from http://www.ossec.net/wiki/index.php/PostgreSQL_Samples).


[2007-08-31 19:22:21.469 ADT] :[unknown] LOG:  connection received: host=192.168.2.99 port=52136

[2007-08-31 19:22:21.485 ADT] 192.168.2.99:ossecdb LOG:  connection authorized: user=ossec_user 

database=ossecdb

[2007-08-31 19:22:22.427 ADT] 192.168.2.99:ossecdb LOG:  disconnection: session time: 0:00:00.95 

user=ossec_user database=ossecdb host=192.168.2.99 port=52136

[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR:  relation "lala" does not exist

[2007-09-27 11:02:46.444 ADT] 192.168.2.10:ossecdb LOG:  disconnection: session time: 0:00:35.79 

user=ossec_user database=ossecdb host=192.168.2.10 port=3584



Log messages:

[2007-09-01 07:14:41.062 ADT] : LOG:  autovacuum: processing database "template1"

[2007-09-01 07:15:41.079 ADT] : LOG:  autovacuum: processing database "ossecdb"



Query log:

[2007-09-01 16:44:49.244 ADT] 192.168.2.10:ossecdb LOG:  duration: 4.550 ms  statement: 

SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'

[2007-09-01 16:44:49.251 ADT] 192.168.2.10:ossecdb LOG:  duration: 5.252 ms  statement: 

INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')

[2007-09-01 16:44:49.252 ADT] 192.168.2.10:ossecdb LOG:  duration: 0.016 ms  statement: 

SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'



[2007-09-27 11:02:51.611 ADT] 192.168.2.10:ossecdb LOG:  statement: INSERT INTO 

alert(id,server_id,rule_id,timestamp,location_id,src_ip) 

VALUES ('3577', '1', '50503','1190916566', '140', '0')



Query error:

[2007-08-31 19:17:42.128 ADT] 192.168.2.99:test ERROR:  relation "alertaaa" does not exist

[2007-08-31 19:17:46.375 ADT] 192.168.2.99:test ERROR:  syntax error at or near "a" at character 1

[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR:  relation "lala" does not exist



Authentication error:

[2007-09-01 19:08:49.862 ADT] : LOG:  connection received: host=192.168.2.99 port=37142

[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"

The actual regexp debugging process

1. Start out simple


Gestalt:tmp dk$ regexp.py postgresql.log.txt "(^.*$)" 0 | tail -n 2

['[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"', '', '']

Counted 25 lines.

Matched 25 lines.

Gestalt:tmp dk$ 

The tail after the command is there because by default it shows all the matching lines, and one is enough for our sample.
The used regexp is (^.*$) which as you know matches everything from the beginning to the end.

2. Basic separation


Gestalt:tmp dk$ regexp.py postgresql.log.txt "^[(?PS+s+S+).d+s+ADT]s+(.*)$" 0

^[(?PS+s+S+).d+s+ADT]s+(.*)$

[('2007-08-31 19:22:21', ':[unknown] LOG:  connection received: host=192.168.2.99 port=52136')]

[('2007-08-31 19:22:21', '192.168.2.99:ossecdb LOG:  connection authorized: 

user=ossec_user database=ossecdb')]

[('2007-08-31 19:22:22', '192.168.2.99:ossecdb LOG:  disconnection: session time: 0:00:00.95 

user=ossec_user database=ossecdb host=192.168.2.99 port=52136')]

[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR:  relation "lala" does not exist')]

[('2007-09-27 11:02:46', '192.168.2.10:ossecdb LOG:  disconnection: session time: 0:00:35.79 

user=ossec_user database=ossecdb host=192.168.2.10 port=3584')]

[('2007-09-01 07:14:41', ': LOG:  autovacuum: processing database "template1"')]

[('2007-09-01 07:15:41', ': LOG:  autovacuum: processing database "ossecdb"')]

[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 4.550 ms  statement: 

SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]

[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 5.252 ms  statement: 

INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')")]

[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG:  duration: 0.016 ms  statement: 

SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]

[('2007-09-27 11:02:51', "192.168.2.10:ossecdb LOG:  statement: 

INSERT INTO alert(id,server_id,rule_id,timestamp,location_id,src_ip) 

VALUES ('3577', '1', '50503','1190916566', '140', '0')")]

[('2007-08-31 19:17:42', '192.168.2.99:test ERROR:  relation "alertaaa" does not exist')]

[('2007-08-31 19:17:46', '192.168.2.99:test ERROR:  syntax error at or near "a" at character 1')]

[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR:  relation "lala" does not exist')]

[('2007-09-01 19:08:49', ': LOG:  connection received: host=192.168.2.99 port=37142')]

[('2007-09-01 19:08:49', '192.168.2.99: FATAL:  password authentication failed for user "ossec_user"')]

Counted 25 lines.

Matched 16 lines.

Gestalt:tmp dk$ 

Here we make a first attempt at separating the original datetime from the rest of the line. Additionally we’ve filtered out the junk lines that might appear in the file, so now we only match 16 lines (the actual 16 lines that contain valid log files). Out regexp starts looking uglier now, but still understandable :wink:: ^[(?PS+s+S+).d+s+ADT]s+(.*)$

3. Extracting the database host and actual DB name


Gestalt:tmp dk$ regexp.py postgresql.log.txt 

"^[(?PS+s+S+).d+s+ADT]s+(?P[^:]+)?:(?PS+)?s+(.*)$" 0 | tail -n 3

[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL:  password authentication failed for user "ossec_user"')]

Counted 25 lines.

Matched 16 lines.

This time I’m facing the first problems. In order to get the output shown above I had to go through a bit of try and error.
Starting with a first try I see there are 5 lines not matching, since we know it’s 16 we need to match:


Gestalt:tmp dk$ regexp.py postgresql.log.txt "^[(?PS+s+S+).d+s+ADT]s+([^:]+):(S+)s+(.*)$" 0 | tail -n 2

Counted 25 lines.

Matched 11 lines.

Gestalt:tmp dk$ 

Let’s see which ones are not matching (notice the change from 0 to 1 at the end of the python’s ARGV):


Gestalt:tmp dk$ regexp.py postgresql.log.txt "^[(?PS+s+S+).d+s+ADT]s+([^:]+):(S+)s+(.*)$" 1 | grep "Not matched: ["

Not matched: [2007-08-31 19:22:21.469 ADT] :[unknown] LOG:  connection received: host=192.168.2.99 port=52136

Not matched: [2007-09-01 07:14:41.062 ADT] : LOG:  autovacuum: processing database "template1"

Not matched: [2007-09-01 07:15:41.079 ADT] : LOG:  autovacuum: processing database "ossecdb"

Not matched: [2007-09-01 19:08:49.862 ADT] : LOG:  connection received: host=192.168.2.99 port=37142

Not matched: [2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL:  password authentication failed for user "ossec_user"

Gestalt:tmp dk$ 

Ok, understood, either DB host or DB name are optional, so let’s add that to our regexp, getting a final regexp of: ^[(?PS+s+S+).d+s+ADT]s+(?P[^:]+)?:(?PS+)?s+(.*)$

 

On the matching log line we can also see how the db host and name are being correctly extracted :wink:.

 

4. Got a plugin_sid value


Gestalt:tmp dk$ regexp.py postgresql.log.txt 

"^[(?PS+s+S+).d+s+ADT]s+(?P[^:]+)?:(?PS+)?s+(?P[^:]+):s+(?P.*)$" 0 | tail -n 3

[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL', 'password authentication failed for user "ossec_user"')]

Counted 25 lines.

Matched 16 lines.

Gestalt:tmp dk$ 

Looking at the remainder string it starts getting obvious that the next field is sort of a priority value. In this first phase the postfix plugin will remain somewhat crippled since I’m going to use that field as plugin sid, instead of identifying each of the log types by what they actually do. (That’s left as a task for the reader).
Our resulting regexp, extracting the priority and removing some white space in front of the logline would be: ^[(?PS+s+S+).d+s+ADT]s+(?P[^:]+)?:(?PS+)?s+(?P[^:]+):s+(?P.*)$

Final Roundup

The plugin is far from being finished, but the goal of this tutorial was to demistify the regexp part a bit, since it’s actually only based on a bit of research about the log (possible values for fields), a bit of intuition and lots of try && error.
I hope that goal has been met.

To be continued… (finishing the regexp, writing the .cfg file and writing the .sql file)

Share this with others

Get price Free trial