Re: pcre for stk is available

From: Brian Denheyer <briand_at_deldotd.com>
Date: Thu, 25 Feb 1999 09:37:40 -0800 (PST)

>>>>> "Harvey" == Harvey J Stein <hjstein_at_bfr.co.il> writes:

    Harvey> Brian Denheyer <briand_at_deldotd.com> writes:
>> Further testing shows that the packages are closer than I previously
>> thought, with pcre almost always slightly faster than the current
>> regexp package.
>>
>> Well, I couldn't resist so I put the same benchmark program in perl
>> and it ran 2x faster than the stk version :-(. Then again, there are
>> lies, damn lies and benchmarks.

    Harvey> How did you test it? One make-regexp with lots of string
    Harvey> matches, I hope? You can try using my wrappers package to
    Harvey> see where the time is spent in the STk version. It might
    Harvey> be in the file reading, not in the regexp execution, for
    Harvey> example.

Basically yes. I compiled one regexp, matched it against 8 different
strings, some match, some don't, and called the test routine with the
arguments over and over.

No file IO.

I'm assuming your wrappers package is on the stk site somewhere ?

Brian

;; A very simple test of the pcre vs. the "stock" regexp package.

;; load the module
(require "pcre")

(define test-p-pattern
  (lambda (pattern-str str options)
    (let ((p-pat (string->p-regexp pattern-str options)))
      (format #t "pattern : ~A string : ~A\n" pattern-str str)
      (format #t "result : ~A\n" (p-pat str)))))
    
;; our handy-dandy test routine
;;
;; PAT is the regular expression to use for matching
;; STRS is a list if strings to match
;; NTRIALS is the number of times to run the matches
;; DISP is #t if you want the routine to print info

(define test
  (lambda (pat strs ntrials disp)
    (time
     (let ((len (vector-length strs)))
     (do ((i 0 (+ i 1)))
         ((>= i ntrials))
       (do ((j 0 (+ j 1)))
           ((>= j len))
         (let ((str (vector-ref strs j)))
           (if disp
               (begin
                 (format #t "string : \"~A\"\n" str)
                 (format #t "match : ~A\n" (pat str)))
               (pat str)))))))))

;; the pattern to use

(define p-str "a([a-z]+)([^0-9]*)([0-9]+)6")

;; the test strings, _very_ little thought went into the choice of
;; strings

(define strings
  (vector
   "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa643539a6"
   "1233333aass45"
   "aslkfj12345sdlkfsklfjkfljslkdfjlskjfsfsldfj6"
   "aslkfj123456sd"
   "abcdefg12345678"
   "a123456ssfsofsjgljgljgljgslkgjsdkl789"
   "a123456sdfkljsflksjfsjflkjfl789a112"
   "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa96"))

;; try to compile a pattern that is broken...

;(string->p-regexp "[a")

;; the original version
(define pat (string->regexp p-str))
(display (regexp? pat))
(newline)

;; the pcre version
(define p-pat (string->p-regexp p-str))
(display (p-regexp? p-pat))
(newline)

(format #t "pattern string \"~A\"\n" p-str)


(define ntrials 1000)

;; this takes about 75 s on my k6-233

(display "native regexp\n")
(test pat strings 1 #t)
(test pat strings ntrials #f)

;; this takes about 60 s

(display "pcre regexp\n")
(test p-pat strings 1 #t)
(test p-pat strings ntrials #f)

;; now try matching some patterns using different options

;; case insensitive

(test-p-pattern "AB" "ab" 0)
(test-p-pattern "AB" "ab" pcre_caseless)

;; anchored, i..e matching must start at the beginning of the string
;; same as using ^

(test-p-pattern "AB" "cdefABfedc" 0)
(test-p-pattern "AB" "cdefABfedc" pcre_anchored)

(exit)
Received on Thu Feb 25 1999 - 18:38:31 CET

This archive was generated by hypermail 2.3.0 : Mon Jul 21 2014 - 19:38:59 CEST