Fast find, between two numbers, in a list

Q&A's, tips, howto's
Locked
kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Fast find, between two numbers, in a list

Post by kanen »

I have a very large list, over 100,000 entries.

It's in the form: ("name" number1 number2)

I'd like to take a value and find the "name" that occurs between the two numbers.

For example;

("kanen" 100 200)

Such that, the number 150 would fall within the above range and return "kanen"

I've tried several different approaches, but non of them are fast enough.

Any ideas?

[edited to add helpful example]
. Kanen Flowers http://kanen.me .

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: Fast find, between two numbers, in a list

Post by cormullion »

Do you mean:

Code: Select all

(("n1" 100 200)
 ("n2" 201 300)
 ("n3" 301 400)
)
and then looking for what matches, say, 360?

I remember asking a similar question.

OK, it wasn't very similar... :)

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Re: Fast find, between two numbers, in a list

Post by itistoday »

One of newLISP's weaknesses is that its memory management model (ORO), which makes it difficult to implement advanced, efficient data-structures in the language itself. I would use a tree-like structure to solve this problem, but newLISP doesn't have one. What you can do is write one, or use an existing one, that's written in C, and then call that code from newLISP.

Use a BST on the first number (the lower bound), then when you find the node you're looking for, make it so that the result of that is another BST that does searching based on the second number (the upper bound), with the result being the string. I think that would be a quick way of getting what you want.
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

If data are as described by Cormullion and are sorted, you could use 'find' with a compare function, looking for the first entry where x is bigger than the first number and smaller than the second.

If this is to slow, you could make an array from the data list:

Code: Select all

(set 'data-array (array (length data) 3 (flat data)))
Then use a binary search on the array:

http://en.wikipedia.org/wiki/Binary_search_algorithm

and if intervals are contiguous, you could throw out the second number.

If you tell us more about the nature of the data, perhaps that would lead to a solution in a whole different direction?

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: Fast find, between two numbers, in a list

Post by kanen »

Yes, the data looks like Cormullion suggested.

Here's an example;

Code: Select all

(set 'loc '(
  ("41.0.0.0" "2097152" 687865856 689963008 "ZA") 
  ("41.32.0.0" "1048576" 689963008 691011584 "EG") 
  ("41.48.0.0" "524288" 691011584 691535872 "ZA") 
  ("41.56.0.0" "65536" 691535872 691601408 "ZA") 
  ("41.57.0.0" "65536" 691601408 691666944 "ZA") 
  ("41.58.0.0" "65536" 691666944 691732480 "NG") 
  ("41.59.0.0" "65536" 691732480 691798016 "TZ") 
  ("41.64.0.0" "131072" 692060160 692191232 "EG") 
  ("41.66.0.0" "16384" 692191232 692207616 "CI") ))
I am finding numbers between S and E and return R, as represented here: '(x x S E R)

So, 689963099 would return "EG"

How about a quick, example? My attempts are quite slow and painful.
Lutz wrote:If data are as described by Cormullion and are sorted, you could use 'find' with a compare function, looking for the first entry where x is bigger than the first number and smaller than the second.

...
. Kanen Flowers http://kanen.me .

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Re: Fast find, between two numbers, in a list

Post by Kazimir Majorinc »

Something like this one?

Code: Select all

(set 'loc '(
  ("41.0.0.0" "2097152" 687865856 689963008 "ZA")
  ("41.32.0.0" "1048576" 689963008 691011584 "EG")
  ("41.48.0.0" "524288" 691011584 691535872 "ZA")
  ("41.56.0.0" "65536" 691535872 691601408 "ZA")
  ("41.57.0.0" "65536" 691601408 691666944 "ZA")
  ("41.58.0.0" "65536" 691666944 691732480 "NG") 
  ("41.59.0.0" "65536" 691732480 691798016 "TZ")
  ("41.64.0.0" "131072" 692060160 692191232 "EG")
  ("41.66.0.0" "16384" 692191232 692207616 "CI") ))

(set 'loca (array (length loc) loc))
(println loca)

(while true
   (setf found nil miny 0 maxy (- (length loca) 1))
   (set 'n (int (read-line)))
   (until found
      (setf t (div (add miny maxy) 2) rt (round t))
      (println "miny=" miny ", maxy=" maxy ", t=" t ", rt=" rt "-->" (loca rt))
      (if (and (<= ((loca rt) 2) n)
               (<= n ((loca rt) 3)))
          (begin (println "found!")
                 (set 'found true))
          (if (< n ((loca rt) 2))
              (setf maxy t)
              (setf miny t)))))
(exit)
(("41.0.0.0" "2097152" 687865856 689963008 "ZA") ("41.32.0.0" "1048576" 689963008
691011584 "EG")
("41.48.0.0" "524288" 691011584 691535872 "ZA")
("41.56.0.0" "65536" 691535872 691601408 "ZA")
("41.57.0.0" "65536" 691601408 691666944 "ZA")
("41.58.0.0" "65536" 691666944 691732480 "NG")
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
("41.66.0.0" "16384" 692191232 692207616 "CI"))
687865856
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
miny=0, maxy=1, t=0.5, rt=0-->("41.0.0.0" "2097152" 687865856 689963008 "ZA")
found!
689963008
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
found!
692191232
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!
692207616
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
miny=7, maxy=8, t=7.5, rt=8-->("41.66.0.0" "16384" 692191232 692207616 "CI")
found!
691601409
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
found!
692060169
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

Fast, binary search! but be careful with rounding, you could get stuck in a loop. It is better to either use 'floor' or use integer arithmetic, which will floor to the next lower integer automatically.

Another hazard are holes in the data, e.g. between these entries:

Code: Select all

("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
… a natural for IP geo databases, but could get the binary search stuck in a loop when a number falls into such a hole, e.g.: 69180123.

You could either fill these holes with "N/A" entries or make the algorithm recursive (see my earlier WikiPedia link), and trap recursive depth. After 20 - for a milion size table - use a: (throw "N/A"). This saves you the work of filling out the holes in your IP database :-). A well written function should be able to serve thousands of IP-to-location lookups per second.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Re: Fast find, between two numbers, in a list

Post by Kazimir Majorinc »

When we're at that - bug discovered:

newLISP v.10.2.5 on Win32 IPv4, execute 'newlisp -h' for more info.
> (round 0.5)
0
> (round 1.5)
2
>

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

Not a bug, but a different rounding scheme for floating point numbers:

Floating point numbers are nor real decimals. Rounding depends on the bit pattern used to represent the floating point number as a binary number. Depending on the whole bit pattern the decimal it represents, may be bigger or smaller, which then affects rounding. In newLISP, as in C, rounding policy is always round-to-nearest.

For a more detailed explanation see also here:

http://en.wikipedia.org/wiki/Rounding#R ... lf_to_even

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Re: Fast find, between two numbers, in a list

Post by itistoday »

What's the function to round 0.5 to 1?
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

... that would be:

http://en.wikipedia.org/wiki/Rounding#R ... _from_zero

Code: Select all

(floor (add x 0.5)) ; for positive x

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Re: Fast find, between two numbers, in a list

Post by itistoday »

So I suppose this will always behave the way Kazimir was expecting (for any x)?

Code: Select all

(sub (round (add 1 x)) 1)
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

It is hard to make an obvious rule for the behavior of 'round' without going into the binary representation of the numbers when using round-to-nearest. Consider this:

Code: Select all

> (round 0.05 -1)
0.1
> (round 0.15 -1)
0.1
> (round 0.25 -1)
0.2
> (round 0.35 -1)
0.3
> (round 0.45 -1)
0.5
> 

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Re: Fast find, between two numbers, in a list

Post by itistoday »

Interesting, thanks for pointing that out. For the ones place it appears consistent though, I think (haven't tested all possibilities, obviously).
Get your Objective newLISP groove on.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Re: Fast find, between two numbers, in a list

Post by Kazimir Majorinc »

How about this one?

Code: Select all

 (set '[println.supressed] true)
 (load "http://www.instprog.com/Instprog.default-library.lsp") ; only for println=
 (set '[println.supressed] (not true))
 (println "---") 
 
(define (round0 x)
  (if (< (mod x 1) 0.5) (floor x) (ceil x)))
 
(define (round2 x (i 0))
  (if (< x 0)
      (sub (round2 (sub x) i))
      (div (round0 (mul x (pow 10 (sub i))))
           (pow 10 (sub i)))))

(println= (round2 0.5))
(println= (round2 1.5))

(dolist (x '(0.05 0.15 0.25 0.35 0.45 0.95))
   (println= x (round2 x -1)))
   
(println= (round2 -1.49) (round -1.49))
(println= (round2 -1.5) (round -1.5))

(println= (round2 3.1415926 -2.5)); whatever it is

(exit)
---
(round2 0.5)=1;
(round2 1.5)=2;
x=0.05; (round2 x -1)=0.1;
x=0.15; (round2 x -1)=0.2;
x=0.25; (round2 x -1)=0.3;
x=0.35; (round2 x -1)=0.4;
x=0.45; (round2 x -1)=0.5;
x=0.95; (round2 x -1)=1;
(round2 -1.49)=-1; (round -1.49)=-1;
(round2 -1.5)=-2; (round -1.5)=-2;
(round2 3.1415926 -2.5)=3.140141717;

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fast find, between two numbers, in a list

Post by Lutz »

… but be aware that this round-half-away-from-zero rounding mode should only be used in a limited type of financial applications, not for financial modeling or for general mathematics type of problems.

In fact for financial accounting floating point numbers shouldn't be used at all but only decimal mode calculations, e.g. using the BCD mode in the i386 family of processors. This is also why Java has a BigDecimal class.

Rounding with above method introduces a systematic bias when adding many rounded numbers in only one, the negative or positive number area.

This bias is not present in newLISP or 'C' and other languages using round-to-nearest type, unbiased rounding as the individual bias evens itself out including when only rounding either positive or negative numbers.

See the Wikipedia article: http://en.wikipedia.org/wiki/Rounding

Locked