Making [text][/text] more convenient

lwix · Post by **lwix** » Wed Nov 10, 2004 2:19 am

Hi Lutz,
This is an rfc - a request for convenience ;-)
Should the parser be made to ignore the first hard newline after
[text]? This would make it easier to write and read boxed-quoted text in scripts.

For example: here is how we currently write 2 indented paragraphs:
[text]___words words words words
words words words words etc
words words words words etc
words words words words etc
___words words words words
words words words words etc
words words words words etc
words words words words etc
[/text]

If the first hard newline is ignored, we can instead do this
[text]
___words words words words
words words words words etc
words words words words etc
words words words words etc
___words words words words
words words words words etc
words words words words etc
words words words words etc
[/text]

I'm guessing the first real-world difference will be felt on the board:
when we use the style for 'code'.
Question: Should this be extended to the hard newline before [/text]?
i.e. at the end of the quoted text. I don't know.

Lucas

Lutz · Post by **Lutz** » Wed Nov 10, 2004 2:27 pm

So basically the [text], [/text] tags would always be on a line by itself. I am not sure ... I have to think about this a little bit longer. It is practical for the situation you describe (and I have run into the same situation many times), on the other side I have used [text], [/text] many times in a flowing line, and on purpose too. Remember that the [text],[/text] tags are not only for the situation, where you have buffers > 2048 characters but are also used as the only delimiters, which completely suppress any other processing of escape characters, quotes etc.. I also wonder what others think?

Lutz

Sammo · Post by **Sammo** » Wed Nov 10, 2004 3:29 pm

Please leave [text][/text] as they are. I, too, use them 'inline' when passing long text strings to newLisp.dll via Hans-Peter's NeoBook interface.

eddier · Post by **eddier** » Wed Nov 10, 2004 7:39 pm

I wished [text][/text] was the {}.
Instead of "", [text][/text], and {} just have "" and {}.

Just my two cents.

Eddie

lwix · Post by **lwix** » Thu Nov 11, 2004 1:57 am

Lutz wrote:So basically the [text], [/text] tags would always be on a line by itself. I am not sure ... I have to think about this a little bit longer...

Lutz

Not at all.

I shall be more precise: the parser would ignore the first newline only if the newline exists.

No current code will be affected except those that already have an explicit newline. In this case, another newline will have to be added.

So the proposed change makes the following equivalent:

[text]hello world
hello world
[/text]

will parse the same as

[text]
hello world
hello world
[/text]

In the second example, because the parser finds a single newline as the first character after [text], it skips it. It does not skip any arbitrary character, only if it is a newline. So we can continue to use [text][/text] either way without change. Hence, the proposal does not break the style of coding where we place [text]hello there[/text] on the same line.

My reference to changes to this board referred to the fact that it will be easier for users to write their normal comments interspersed with blocks of code. The board code would not need changing. (if it does use [text][/text] that is.)

Lucas

newdep · Post by **newdep** » Thu Nov 11, 2004 7:51 am

Hello All...

Where [text] is very handy i must agree with eddy. That { } would be a great
alternative. As we have in newlisp the "newlispEvalStr" function the need for
[text] [/text] is highly needed there.

As Tcl uses { } also for Eval and where " " is not recommended in Tcl. This is
for Newlisp purly a shorter way of notation.

It could be an option though to have the choise in a "define/contstant/set"
to have a user defined "Comment" like (where it must be included in
every source file at the beginning, otherwise it wont work) -->>

(setq comment '( "[text]" "[/text"] )

(setq comment '( "{" "}" )

(setq comment '( "[text]\n" "[/text]\n" )

Norman.

lwix · Post by **lwix** » Thu Nov 11, 2004 11:15 am

newdep wrote: ...
It could be an option though to have the choise in a "define/contstant/set"
to have a user defined "Comment" like (where it must be included in
every source file at the beginning, otherwise it wont work) -->>

(setq comment '( "[text]" "[/text"] )

(setq comment '( "{" "}" )

(setq comment '( "[text]\n" "[/text]\n" )

Norman.

Nice idea but there would be a large performance hit on the parser. You could write yourself a preprocessor that does the necessary substitutions prior to running the main script. But that would be too much work and messy ...

However, my suggestion could easily be implemented within the function or macro that is doing the processing. e.g.

Code: Select all

(define-macro (dostuff! _str)
  ;; check for special case
  (if (= (first (eval _str)) "\n")
    (nth-set 0 (eval _str) ""))

  ;; now we do stuff to the string

  nil ;; dummy return
)

As I said, it's just a convenience thing :-)
Lucas.

eddier · Post by **eddier** » Thu Nov 11, 2004 1:55 pm

I see your point lwix. Also check for white space between the [text] and \n.
So that (I'm using "{}" in place of "[text][/text]."

Code: Select all

(print { \t\t\t ; white space
Content-type: text/html

<html>
.
.
.
</html>})

equals

Code: Select all

(print {Content-type: text/html

<html>
.
.
.
</html>})

I like it.

Eddie

Lutz · Post by **Lutz** » Thu Nov 11, 2004 2:36 pm

- All the text delimiting tags are enclosing data, so it would not be wise to change those when reading and parseing the tags. Imagine you have the sequence [text]LFLFLFLF ... [/text], so the data starts with mutliple linefeed. If you now strip away the first linefeed while reading and then you serialize the data again to a file or memory, then every time you parse the [text],[/text] tags you would loose characters from your data. The point is all text delimiting tags may reformat for display, but always should serialize the data in a way, they can be parsed back without change.

- We cannot really collapse {,} and [text],[/text] in to only using {,}, because as text-buffers grow very big the probability of having unbalanced {,} inside is very high. This can be observed in smaller string portions, but is hard to watch if you deal with a 50 kByte web page, which perhaps isn't even yours but read from somewhere else. The [text],[/text] tags are multicharacter on purpose, its a pain to type but the probability of having a [/text] has part of the text is very low. This is why for example in XML you use <![cdata[ and ]]> as text data tags, they are hard to type, but pretty safe not to be part of normal text.

- {,} are convenient when using small portions of text with double quotes inside, like it happens frequently in HTML portions or TCL/TK text. In TCL/TK {,} are fine becuase if they occur inside the string, they are always balanced for correct Tcl syntax.

- The possibility of defining your own delimiter tags, doesn't work for newLISP, because in newLISP any object can be serialized to a file and should then be readable back in a standard way. If you let define tags customary, you loose that ability. Serializing data objects to a file with (save 'symbol) is very important in newLISP as a convenient, quick way to save/reload data.

To make a long story short, I think we leave it like it is.

Lutz

eddier · Post by **eddier** » Thu Nov 11, 2004 5:41 pm

Okidoki!

What if {} were made to handle text of indefinite length (just to be not limited to 2048 chars). Then one could use {} for inline and [text][/text] for multiline. The only use I can see for the {} buggers is for pattern matching since I can't use \n, \t, etc. within them. For example I cannot use

Code: Select all

(print 
  (append
    {<table>\n<tr>\n<td>\n}
    (format {<a href="%s">%s</a>\n} toGo clickable)
    (format { ...

But then again, I now use the convention

Code: Select all

(define (<page> %w)
  (replace "`(.+?)`" %w (string (eval-string $1)) 0)))

(print (<page> [text]Content-type: text/html

<html>
...
</html>[/text]))

I don't know, so maybe I'm beating a dead snake here?

Eddie

Lutz · Post by **Lutz** » Thu Nov 11, 2004 6:09 pm

making {,} for unlimited length would be a performance hit for {,} parseing. The [text],[/text] tags call a different function for reading unlimited buffers length and adjusting (re allocating) memory on the go, as required.

If I do this in {,} parseing, together with balance checking of inside {,} it would be a performance hit for {,}. At the moment the buffer for "," and {,} delimited tokens is allocated on the stack with 2048. I could increase that number lets say to 16384, but then I run out of stack much sooner when 'C' getToken() is called recursevely inside newLISP routines, which may happen quite often.

Lutz

eddier · Post by **eddier** » Thu Nov 11, 2004 9:00 pm

Tradeoffs Tradeoffs Tradeoffs ... We like small size and efficiency so we can live with it.

Thanks Lutz!

Eddie

lwix · Post by **lwix** » Thu Nov 11, 2004 10:19 pm

Quite true :-)
Thank you.