Fast find, between two numbers, in a list
Fast find, between two numbers, in a list
I have a very large list, over 100,000 entries.
It's in the form: ("name" number1 number2)
I'd like to take a value and find the "name" that occurs between the two numbers.
For example;
("kanen" 100 200)
Such that, the number 150 would fall within the above range and return "kanen"
I've tried several different approaches, but non of them are fast enough.
Any ideas?
[edited to add helpful example]
It's in the form: ("name" number1 number2)
I'd like to take a value and find the "name" that occurs between the two numbers.
For example;
("kanen" 100 200)
Such that, the number 150 would fall within the above range and return "kanen"
I've tried several different approaches, but non of them are fast enough.
Any ideas?
[edited to add helpful example]
. Kanen Flowers http://kanen.me .
-
- Posts: 2038
- Joined: Tue Nov 29, 2005 8:28 pm
- Location: latiitude 50N longitude 3W
- Contact:
Re: Fast find, between two numbers, in a list
Do you mean:
and then looking for what matches, say, 360?
I remember asking a similar question.
OK, it wasn't very similar... :)
Code: Select all
(("n1" 100 200)
("n2" 201 300)
("n3" 301 400)
)
I remember asking a similar question.
OK, it wasn't very similar... :)
Re: Fast find, between two numbers, in a list
One of newLISP's weaknesses is that its memory management model (ORO), which makes it difficult to implement advanced, efficient data-structures in the language itself. I would use a tree-like structure to solve this problem, but newLISP doesn't have one. What you can do is write one, or use an existing one, that's written in C, and then call that code from newLISP.
Use a BST on the first number (the lower bound), then when you find the node you're looking for, make it so that the result of that is another BST that does searching based on the second number (the upper bound), with the result being the string. I think that would be a quick way of getting what you want.
Use a BST on the first number (the lower bound), then when you find the node you're looking for, make it so that the result of that is another BST that does searching based on the second number (the upper bound), with the result being the string. I think that would be a quick way of getting what you want.
Get your Objective newLISP groove on.
Re: Fast find, between two numbers, in a list
If data are as described by Cormullion and are sorted, you could use 'find' with a compare function, looking for the first entry where x is bigger than the first number and smaller than the second.
If this is to slow, you could make an array from the data list:
Then use a binary search on the array:
http://en.wikipedia.org/wiki/Binary_search_algorithm
and if intervals are contiguous, you could throw out the second number.
If you tell us more about the nature of the data, perhaps that would lead to a solution in a whole different direction?
If this is to slow, you could make an array from the data list:
Code: Select all
(set 'data-array (array (length data) 3 (flat data)))
http://en.wikipedia.org/wiki/Binary_search_algorithm
and if intervals are contiguous, you could throw out the second number.
If you tell us more about the nature of the data, perhaps that would lead to a solution in a whole different direction?
Re: Fast find, between two numbers, in a list
Yes, the data looks like Cormullion suggested.
Here's an example;
I am finding numbers between S and E and return R, as represented here: '(x x S E R)
So, 689963099 would return "EG"
How about a quick, example? My attempts are quite slow and painful.
Here's an example;
Code: Select all
(set 'loc '(
("41.0.0.0" "2097152" 687865856 689963008 "ZA")
("41.32.0.0" "1048576" 689963008 691011584 "EG")
("41.48.0.0" "524288" 691011584 691535872 "ZA")
("41.56.0.0" "65536" 691535872 691601408 "ZA")
("41.57.0.0" "65536" 691601408 691666944 "ZA")
("41.58.0.0" "65536" 691666944 691732480 "NG")
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
("41.66.0.0" "16384" 692191232 692207616 "CI") ))
So, 689963099 would return "EG"
How about a quick, example? My attempts are quite slow and painful.
Lutz wrote:If data are as described by Cormullion and are sorted, you could use 'find' with a compare function, looking for the first entry where x is bigger than the first number and smaller than the second.
...
. Kanen Flowers http://kanen.me .
-
- Posts: 388
- Joined: Thu May 08, 2008 1:24 am
- Location: Croatia
- Contact:
Re: Fast find, between two numbers, in a list
Something like this one?
(("41.0.0.0" "2097152" 687865856 689963008 "ZA") ("41.32.0.0" "1048576" 689963008
691011584 "EG")
("41.48.0.0" "524288" 691011584 691535872 "ZA")
("41.56.0.0" "65536" 691535872 691601408 "ZA")
("41.57.0.0" "65536" 691601408 691666944 "ZA")
("41.58.0.0" "65536" 691666944 691732480 "NG")
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
("41.66.0.0" "16384" 692191232 692207616 "CI"))
687865856
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
miny=0, maxy=1, t=0.5, rt=0-->("41.0.0.0" "2097152" 687865856 689963008 "ZA")
found!
689963008
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
found!
692191232
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!
692207616
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
miny=7, maxy=8, t=7.5, rt=8-->("41.66.0.0" "16384" 692191232 692207616 "CI")
found!
691601409
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
found!
692060169
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!
Code: Select all
(set 'loc '(
("41.0.0.0" "2097152" 687865856 689963008 "ZA")
("41.32.0.0" "1048576" 689963008 691011584 "EG")
("41.48.0.0" "524288" 691011584 691535872 "ZA")
("41.56.0.0" "65536" 691535872 691601408 "ZA")
("41.57.0.0" "65536" 691601408 691666944 "ZA")
("41.58.0.0" "65536" 691666944 691732480 "NG")
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
("41.66.0.0" "16384" 692191232 692207616 "CI") ))
(set 'loca (array (length loc) loc))
(println loca)
(while true
(setf found nil miny 0 maxy (- (length loca) 1))
(set 'n (int (read-line)))
(until found
(setf t (div (add miny maxy) 2) rt (round t))
(println "miny=" miny ", maxy=" maxy ", t=" t ", rt=" rt "-->" (loca rt))
(if (and (<= ((loca rt) 2) n)
(<= n ((loca rt) 3)))
(begin (println "found!")
(set 'found true))
(if (< n ((loca rt) 2))
(setf maxy t)
(setf miny t)))))
(exit)
691011584 "EG")
("41.48.0.0" "524288" 691011584 691535872 "ZA")
("41.56.0.0" "65536" 691535872 691601408 "ZA")
("41.57.0.0" "65536" 691601408 691666944 "ZA")
("41.58.0.0" "65536" 691666944 691732480 "NG")
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
("41.66.0.0" "16384" 692191232 692207616 "CI"))
687865856
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
miny=0, maxy=1, t=0.5, rt=0-->("41.0.0.0" "2097152" 687865856 689963008 "ZA")
found!
689963008
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=0, maxy=4, t=2, rt=2-->("41.48.0.0" "524288" 691011584 691535872 "ZA")
miny=0, maxy=2, t=1, rt=1-->("41.32.0.0" "1048576" 689963008 691011584 "EG")
found!
692191232
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!
692207616
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
miny=7, maxy=8, t=7.5, rt=8-->("41.66.0.0" "16384" 692191232 692207616 "CI")
found!
691601409
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
found!
692060169
miny=0, maxy=8, t=4, rt=4-->("41.57.0.0" "65536" 691601408 691666944 "ZA")
miny=4, maxy=8, t=6, rt=6-->("41.59.0.0" "65536" 691732480 691798016 "TZ")
miny=6, maxy=8, t=7, rt=7-->("41.64.0.0" "131072" 692060160 692191232 "EG")
found!
Re: Fast find, between two numbers, in a list
Fast, binary search! but be careful with rounding, you could get stuck in a loop. It is better to either use 'floor' or use integer arithmetic, which will floor to the next lower integer automatically.
Another hazard are holes in the data, e.g. between these entries:
… a natural for IP geo databases, but could get the binary search stuck in a loop when a number falls into such a hole, e.g.: 69180123.
You could either fill these holes with "N/A" entries or make the algorithm recursive (see my earlier WikiPedia link), and trap recursive depth. After 20 - for a milion size table - use a: (throw "N/A"). This saves you the work of filling out the holes in your IP database :-). A well written function should be able to serve thousands of IP-to-location lookups per second.
Another hazard are holes in the data, e.g. between these entries:
Code: Select all
("41.59.0.0" "65536" 691732480 691798016 "TZ")
("41.64.0.0" "131072" 692060160 692191232 "EG")
You could either fill these holes with "N/A" entries or make the algorithm recursive (see my earlier WikiPedia link), and trap recursive depth. After 20 - for a milion size table - use a: (throw "N/A"). This saves you the work of filling out the holes in your IP database :-). A well written function should be able to serve thousands of IP-to-location lookups per second.
-
- Posts: 388
- Joined: Thu May 08, 2008 1:24 am
- Location: Croatia
- Contact:
Re: Fast find, between two numbers, in a list
When we're at that - bug discovered:
newLISP v.10.2.5 on Win32 IPv4, execute 'newlisp -h' for more info.
> (round 0.5)
0
> (round 1.5)
2
>
newLISP v.10.2.5 on Win32 IPv4, execute 'newlisp -h' for more info.
> (round 0.5)
0
> (round 1.5)
2
>
Re: Fast find, between two numbers, in a list
Not a bug, but a different rounding scheme for floating point numbers:
Floating point numbers are nor real decimals. Rounding depends on the bit pattern used to represent the floating point number as a binary number. Depending on the whole bit pattern the decimal it represents, may be bigger or smaller, which then affects rounding. In newLISP, as in C, rounding policy is always round-to-nearest.
For a more detailed explanation see also here:
http://en.wikipedia.org/wiki/Rounding#R ... lf_to_even
Floating point numbers are nor real decimals. Rounding depends on the bit pattern used to represent the floating point number as a binary number. Depending on the whole bit pattern the decimal it represents, may be bigger or smaller, which then affects rounding. In newLISP, as in C, rounding policy is always round-to-nearest.
For a more detailed explanation see also here:
http://en.wikipedia.org/wiki/Rounding#R ... lf_to_even
Re: Fast find, between two numbers, in a list
What's the function to round 0.5 to 1?
Get your Objective newLISP groove on.
Re: Fast find, between two numbers, in a list
... that would be:
http://en.wikipedia.org/wiki/Rounding#R ... _from_zero
http://en.wikipedia.org/wiki/Rounding#R ... _from_zero
Code: Select all
(floor (add x 0.5)) ; for positive x
Re: Fast find, between two numbers, in a list
So I suppose this will always behave the way Kazimir was expecting (for any x)?
Code: Select all
(sub (round (add 1 x)) 1)
Get your Objective newLISP groove on.
Re: Fast find, between two numbers, in a list
It is hard to make an obvious rule for the behavior of 'round' without going into the binary representation of the numbers when using round-to-nearest. Consider this:
Code: Select all
> (round 0.05 -1)
0.1
> (round 0.15 -1)
0.1
> (round 0.25 -1)
0.2
> (round 0.35 -1)
0.3
> (round 0.45 -1)
0.5
>
Re: Fast find, between two numbers, in a list
Interesting, thanks for pointing that out. For the ones place it appears consistent though, I think (haven't tested all possibilities, obviously).
Get your Objective newLISP groove on.
-
- Posts: 388
- Joined: Thu May 08, 2008 1:24 am
- Location: Croatia
- Contact:
Re: Fast find, between two numbers, in a list
How about this one?
---
(round2 0.5)=1;
(round2 1.5)=2;
x=0.05; (round2 x -1)=0.1;
x=0.15; (round2 x -1)=0.2;
x=0.25; (round2 x -1)=0.3;
x=0.35; (round2 x -1)=0.4;
x=0.45; (round2 x -1)=0.5;
x=0.95; (round2 x -1)=1;
(round2 -1.49)=-1; (round -1.49)=-1;
(round2 -1.5)=-2; (round -1.5)=-2;
(round2 3.1415926 -2.5)=3.140141717;
Code: Select all
(set '[println.supressed] true)
(load "http://www.instprog.com/Instprog.default-library.lsp") ; only for println=
(set '[println.supressed] (not true))
(println "---")
(define (round0 x)
(if (< (mod x 1) 0.5) (floor x) (ceil x)))
(define (round2 x (i 0))
(if (< x 0)
(sub (round2 (sub x) i))
(div (round0 (mul x (pow 10 (sub i))))
(pow 10 (sub i)))))
(println= (round2 0.5))
(println= (round2 1.5))
(dolist (x '(0.05 0.15 0.25 0.35 0.45 0.95))
(println= x (round2 x -1)))
(println= (round2 -1.49) (round -1.49))
(println= (round2 -1.5) (round -1.5))
(println= (round2 3.1415926 -2.5)); whatever it is
(exit)
(round2 0.5)=1;
(round2 1.5)=2;
x=0.05; (round2 x -1)=0.1;
x=0.15; (round2 x -1)=0.2;
x=0.25; (round2 x -1)=0.3;
x=0.35; (round2 x -1)=0.4;
x=0.45; (round2 x -1)=0.5;
x=0.95; (round2 x -1)=1;
(round2 -1.49)=-1; (round -1.49)=-1;
(round2 -1.5)=-2; (round -1.5)=-2;
(round2 3.1415926 -2.5)=3.140141717;
Re: Fast find, between two numbers, in a list
… but be aware that this round-half-away-from-zero rounding mode should only be used in a limited type of financial applications, not for financial modeling or for general mathematics type of problems.
In fact for financial accounting floating point numbers shouldn't be used at all but only decimal mode calculations, e.g. using the BCD mode in the i386 family of processors. This is also why Java has a BigDecimal class.
Rounding with above method introduces a systematic bias when adding many rounded numbers in only one, the negative or positive number area.
This bias is not present in newLISP or 'C' and other languages using round-to-nearest type, unbiased rounding as the individual bias evens itself out including when only rounding either positive or negative numbers.
See the Wikipedia article: http://en.wikipedia.org/wiki/Rounding
In fact for financial accounting floating point numbers shouldn't be used at all but only decimal mode calculations, e.g. using the BCD mode in the i386 family of processors. This is also why Java has a BigDecimal class.
Rounding with above method introduces a systematic bias when adding many rounded numbers in only one, the negative or positive number area.
This bias is not present in newLISP or 'C' and other languages using round-to-nearest type, unbiased rounding as the individual bias evens itself out including when only rounding either positive or negative numbers.
See the Wikipedia article: http://en.wikipedia.org/wiki/Rounding