![]() |
Tutorial 6: Plugin writing primer Tue, 11 Mar 2008
A couple of days ago I was fixing the fortinet/fortigate with the kind help of a Swiss OSSIM user (thanks Mikael ;-) ) and I wrote this little piece of python in order to help me out with it. Now I'm using it a lot to debug plugins so I guess more people could benefit from this also :-)
::start here
#!/usr/bin/python
import sys,re
if sys.argv[3] is None:
print "Args are filename, regexp and [0|1]"
f = open(sys.argv[1], 'r')
data = f.readlines()
exp=sys.argv[2]
print sys.argv[2]
line_match = 0
matched = 0
for line in data:
result = re.findall(exp,line)
try:
tmp = result[0]
except IndexError:
if sys.argv[3] is "1":
print "Not matched:", line
continue
print result
matched += 1
print "Counted", len(data), "lines."
print "Matched", matched, "lines."
Basically it will take a logfile as input, a regexp and wether to verbosely show the matched lines or not. That way you can start working towards more complex regexps and test it against a full logfile in realtime. regexp.py logfile "regexp" (0|1) # 0 == do not show "non-matching lines" Let's go through a simple file containing logs (got it from http://www.ossec.net/wiki/index.php/PostgreSQL_Samples).
[2007-08-31 19:22:21.469 ADT] :[unknown] LOG: connection received: host=192.168.2.99 port=52136
[2007-08-31 19:22:21.485 ADT] 192.168.2.99:ossecdb LOG: connection authorized: user=ossec_user
database=ossecdb
[2007-08-31 19:22:22.427 ADT] 192.168.2.99:ossecdb LOG: disconnection: session time: 0:00:00.95
user=ossec_user database=ossecdb host=192.168.2.99 port=52136
[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR: relation "lala" does not exist
[2007-09-27 11:02:46.444 ADT] 192.168.2.10:ossecdb LOG: disconnection: session time: 0:00:35.79
user=ossec_user database=ossecdb host=192.168.2.10 port=3584
Log messages:
[2007-09-01 07:14:41.062 ADT] : LOG: autovacuum: processing database "template1"
[2007-09-01 07:15:41.079 ADT] : LOG: autovacuum: processing database "ossecdb"
Query log:
[2007-09-01 16:44:49.244 ADT] 192.168.2.10:ossecdb LOG: duration: 4.550 ms statement:
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'
[2007-09-01 16:44:49.251 ADT] 192.168.2.10:ossecdb LOG: duration: 5.252 ms statement:
INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')
[2007-09-01 16:44:49.252 ADT] 192.168.2.10:ossecdb LOG: duration: 0.016 ms statement:
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'
[2007-09-27 11:02:51.611 ADT] 192.168.2.10:ossecdb LOG: statement: INSERT INTO
alert(id,server_id,rule_id,timestamp,location_id,src_ip)
VALUES ('3577', '1', '50503','1190916566', '140', '0')
Query error:
[2007-08-31 19:17:42.128 ADT] 192.168.2.99:test ERROR: relation "alertaaa" does not exist
[2007-08-31 19:17:46.375 ADT] 192.168.2.99:test ERROR: syntax error at or near "a" at character 1
[2007-09-27 11:02:44.941 ADT] 192.168.2.10:ossecdb ERROR: relation "lala" does not exist
Authentication error:
[2007-09-01 19:08:49.862 ADT] : LOG: connection received: host=192.168.2.99 port=37142
[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL: password authentication failed for user "ossec_user"
The actual regexp debugging process1. Start out simpleGestalt:tmp dk$ regexp.py postgresql.log.txt "(^.*$)" 0 | tail -n 2 ['[2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL: password authentication failed for user "ossec_user"', '', ''] Counted 25 lines. Matched 25 lines. Gestalt:tmp dk$
The tail after the command is there because by default it shows all the matching lines, and one is enough for our sample. 2. Basic separation
Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$" 0
^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$
[('2007-08-31 19:22:21', ':[unknown] LOG: connection received: host=192.168.2.99 port=52136')]
[('2007-08-31 19:22:21', '192.168.2.99:ossecdb LOG: connection authorized:
user=ossec_user database=ossecdb')]
[('2007-08-31 19:22:22', '192.168.2.99:ossecdb LOG: disconnection: session time: 0:00:00.95
user=ossec_user database=ossecdb host=192.168.2.99 port=52136')]
[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR: relation "lala" does not exist')]
[('2007-09-27 11:02:46', '192.168.2.10:ossecdb LOG: disconnection: session time: 0:00:35.79
user=ossec_user database=ossecdb host=192.168.2.10 port=3584')]
[('2007-09-01 07:14:41', ': LOG: autovacuum: processing database "template1"')]
[('2007-09-01 07:15:41', ': LOG: autovacuum: processing database "ossecdb"')]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG: duration: 4.550 ms statement:
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG: duration: 5.252 ms statement:
INSERT INTO location(server_id, name) VALUES ('1', 'enigma->/var/log/messages')")]
[('2007-09-01 16:44:49', "192.168.2.10:ossecdb LOG: duration: 0.016 ms statement:
SELECT id FROM location WHERE name = 'enigma->/var/log/messages' AND server_id = '1'")]
[('2007-09-27 11:02:51', "192.168.2.10:ossecdb LOG: statement:
INSERT INTO alert(id,server_id,rule_id,timestamp,location_id,src_ip)
VALUES ('3577', '1', '50503','1190916566', '140', '0')")]
[('2007-08-31 19:17:42', '192.168.2.99:test ERROR: relation "alertaaa" does not exist')]
[('2007-08-31 19:17:46', '192.168.2.99:test ERROR: syntax error at or near "a" at character 1')]
[('2007-09-27 11:02:44', '192.168.2.10:ossecdb ERROR: relation "lala" does not exist')]
[('2007-09-01 19:08:49', ': LOG: connection received: host=192.168.2.99 port=37142')]
[('2007-09-01 19:08:49', '192.168.2.99: FATAL: password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.
Gestalt:tmp dk$
Here we make a first attempt at separating the original datetime from the rest of the line. Additionally we've filtered out the junk lines that might appear in the file, so now we only match 16 lines (the actual 16 lines that contain valid log files). Out regexp starts looking uglier now, but still understandable ;-): ^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(.*)$ 3. Extracting the database host and actual DB name
Gestalt:tmp dk$ regexp.py postgresql.log.txt
"^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(.*)$" 0 | tail -n 3
[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL: password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.
This time I'm facing the first problems. In order to get the output shown above I had to go through a bit of try and error. Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+([^:]+):(\S+)\s+(.*)$" 0 | tail -n 2 Counted 25 lines. Matched 11 lines. Gestalt:tmp dk$ Gestalt:tmp dk$ regexp.py postgresql.log.txt "^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+([^:]+):(\S+)\s+(.*)$" 1 | grep "Not matched: \[" Not matched: [2007-08-31 19:22:21.469 ADT] :[unknown] LOG: connection received: host=192.168.2.99 port=52136 Not matched: [2007-09-01 07:14:41.062 ADT] : LOG: autovacuum: processing database "template1" Not matched: [2007-09-01 07:15:41.079 ADT] : LOG: autovacuum: processing database "ossecdb" Not matched: [2007-09-01 19:08:49.862 ADT] : LOG: connection received: host=192.168.2.99 port=37142 Not matched: [2007-09-01 19:08:49.869 ADT] 192.168.2.99: FATAL: password authentication failed for user "ossec_user" Gestalt:tmp dk$ On the matching log line we can also see how the db host and name are being correctly extracted ;-). 4. Got a plugin_sid value
Gestalt:tmp dk$ regexp.py postgresql.log.txt
"^\[(?P<date>\S+\s+\S+)\.\d+\s+ADT\]\s+(?P<dbhost>[^:]+)?:(?P<dbname>\S+)?\s+(?P<sid>[^:]+):\s+(?P<log_msg>.*)$" 0 | tail -n 3
[('2007-09-01 19:08:49', '192.168.2.99', '', 'FATAL', 'password authentication failed for user "ossec_user"')]
Counted 25 lines.
Matched 16 lines.
Gestalt:tmp dk$
Looking at the remainder string it starts getting obvious that the next field is sort of a priority value. In this first phase the postfix plugin will remain somewhat crippled since I'm going to use that field as plugin sid, instead of identifying each of the log types by what they actually do. (That's left as a task for the reader).
Final Roundup
The plugin is far from being finished, but the goal of this tutorial was to demistify the regexp part a bit, since it's actually only based on a bit of research about the log (possible values for fields), a bit of intuition and lots of try && error. To be continued... (finishing the regexp, writing the .cfg file and writing the .sql file)
posted at: 11:38 | path: /ossim/tutorials | permanent link to this entry | 2 comments |
* Posted by Gabriel at Tue Mar 11 12:04:29 2008
hello
For curious readers i would suggest reading the links on this page: http://swtch.com/~rsc/regexp/ There are links to various regular expression implementations in C, comenting about performance, functionality and theory.
* Posted by Calvin Mills at Wed Mar 26 18:40:52 2008
I have a plugin that has been working with only one regex and sid test. But after reading this I figured I would add another test to the plugin, and when I tested the regex against the log it works fine, but when I insert it into my plugin, it never gets caught???
Everything is still caught by the first regex sid 1. Is there a particular order that they need to be in? I have added them to database, and I have restarted many times. I am testing the line Mar 26 11:20:33 192.168.0.13 home kernel: quickcam: frame lost with the regex (?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*? quickcam.*))$ and my full plugin config is ;; syslog ;; plugin_id: 4007 ;; type: detector ;; [DEFAULT] plugin_id=4007 [config] type=detector enable=yes source=log # Enable syslog to log everything to one file. Add it to log rotation also. # echo "*.* /var/log/all.log" >> /etc/syslog.conf; killall -HUP syslogd location=/var/log/all.log # create log file if it does not exists, # otherwise stop processing this plugin create_file=true process=syslogd ; -r || -u start=no ; launch plugin process when agent starts stop=no ; shutdown plugin process when agent stops startup=/etc/init.d/%(process)s start shutdown=/etc/init.d/%(process)s stop ## rules [syslog - datamining - fallthrough] # event_type=event regexp="^(?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*))$" sensor={resolv($sensor)} date={normalize_date($1)} plugin_sid=1 #userdata1={md5sum($logline)} userdata1={$source} userdata2={$generator} userdata3={$logged_event} [syslog - datamining - quickcam] # event_type=event regexp="^(?P<logline>(\S+\s+\d+\s+\d\d:\d\d:\d\d)\s+(?P<sensor>[^\s]+)\s+(?P<source>[^\s]+)\s+(?P<generator>[^\:]*):(?P<logged_event>.*? quickcam.*))$" sensor={resolv($sensor)} date={normalize_date($1)} plugin_sid=3 #userdata1={md5sum($logline)} userdata1={$source} userdata2={$generator} userdata3={$logged_event}
|
Categories
/ (66) Dominique Karg (feel free to get in touch) Friend's blogs:
Archives
2010-Apr Tags | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||



