LinuxSA Mailing list archives

Index: [thread] [date] [subject] [author] [stats]
  From: Dan <dan.kortschak@adelaide.edu.au>
  To  : <linuxsa@linuxsa.org.au>
  Date: Tue, 27 Feb 2001 18:45:07 +1030

Re: sloppy grep in perl?

OK, the problem is actually rather hard and as far as I can see all the
grep-like thing I know won't do it without specifying all of the places that
mistakes can occur, and for even short sequences this becomes prohibitive. Here
is a hard (in both senses) example from a pattern matcher called (inventively
called scan_for_matches):

CCCCCGGGCTGCAG[2,1,1] GAATTC[2,1,1] 100...1000 CTCGAG[2,1,1] GGGGGGCCCGGT[2,1,1]

As you may be able to tell it's looking at DNA sequences. It matches the first
sequence with the possibility of up to two mismatches and deviation from the
specified length in either direction of one character, same for the second,
followed by /.[100,1000]/ then the same as the first and second for the fourth
and fifth.
The important difference between this and the solutions below is that mismatches
and insertions/deletions add up over a subpattern, so I can specify that I must
have at least, say, 75% of the character the right thing according to the
pattern - as far as I can see this would not be possible with a regexp (am I
wrong?).
I hear you asking why I don't just use scan_for_matches: well, the syntax is
unpleaant and pattern replacement would be nice (To do all these I need to layer
regexps over the top of scan_for_matches and the syntax becomes even more
unpleasant). Of course, maybe I'm just being picky.

thanks
Dan


Andrew Hill wrote:
> 
> Dan Kortschak wrote:
> 
> > eg (making up a syntax as I go along). /(fao){1,0,0}/ would match foo as
> > {1,0,0} specifies allow one mismatch
> 
> How about: /f.o/
> 
> > and /(fooo){0,1,0}/ would also match
> > foo as {0,1,0) specifies allow one insertion, etc.
> 
> How about: /fo.?o/
> 
> Hard to be sure, as I'm not really clear on what you are after - perhaps
> if you can send me some more examples of what you want to match and what
> you don't, I can help? (Although it won't be until tomorrow - off home
> soon :-)
> 
> Cheers,
> 
> --
> Andrew Hill
-- 
_____________________________________________________________   .`.`o     
                                                         o| ,\__ `./`r
  Dan Kortschak                                          <\/    \_O> O    
  Genetics (DMB)          phone :+61 8 8303 4863          "|`...'.\
  Adelaide University     fax   :+61 8 8303 4399           `      :\ 
  Australia 5005          mailto:dan.kortschak@adelaide.edu.au    : \

  "Nullius addictus iurare in verba magistri." -- Horace

-- 
LinuxSA WWW: http://www.linuxsa.org.au/  IRC: #linuxsa on irc.linux.org.au
To unsubscribe from the LinuxSA list:
  mail linuxsa-request@linuxsa.org.au with "unsubscribe" as the subject


Index: [thread] [date] [subject] [author] [stats]
Return to the LinuxSA Mailing List Information Page