File search using regex
Tim Wegener
twegener at fastmail.fm
Thu Apr 19 12:08:35 CST 2007
On Thu, 19 Apr 2007 11:41:35 +0930 (CST)
Chris Foote <chris at inetd.com.au> wrote:
> hehe... Perl's grep function is good.
>
> This is close:
>
> #!/usr/bin/python
> # usage: search.py files...
> import sys, re
>
> pattern = re.compile(r'\n?!D2\nInvoice\n!C\nAUSTALIA EIGHT')
> print '\n'.join([f for f in sys.argv[1::] if pattern.search(file(f).read())])
Given that this is just a string match (no regular expression smarts in the given pattern) you could reduce that to:
# grep_test.py
import sys
pattern = '\n?!D2\nInvoice\n!C\nAUSTALIA EIGHT'
print '\n'.join(f for f in sys.argv[1:] if pattern in open(f).read())
Perl still wins in terms of speed though.
$ time python2.5 grep_test.py ~/spam_training/spam_corpus.tar.bz2
real 0m1.793s
user 0m0.469s
sys 0m1.246s
$ time python2.4 grep_test.py ~/spam_training/spam_corpus.tar.bz2
real 0m17.606s
user 0m16.166s
sys 0m1.263s
$ time perl search.pl ~/spam_training/spam_corpus.tar.bz2
real 0m1.637s
user 0m0.355s
sys 0m1.278s
(I reran these a few times to make sure that caching, etc, didn't influence the numbers.)
(That bz2 file is 433MiB.)
Tim
More information about the linuxsa
mailing list