Re: Garbage collection, part two...

From: Erick Gallesio <eg_at_unice.fr>
Date: Sun, 16 Apr 2000 22:30:18 +0200 (CEST)

Hi Paul,

I apologize for the delay I took to answer you mail but I was off the
net last week. Here I am again ...

Most of the time that I have a mail from you, I fear that it will talk of GC
problems, and ... Bingo this is the case ;-) (note that the title of
your mail should have given me some hints).
>
> Sorry for that last truncated post. I accidentally hit the
> send button before I was finished with it.
>
> As I was saying....
>
> The fix to the first garbage collection problem is attached
> to that email. I think it is self contained, but if you
> have any questions please ask.

That is clear and I have applied your modifications to my dev tree.

>
> The second problem is much more subtle, and harder to fix.
>
> It only shows up when the file containing STk_execute_Tcl_lib_cmd
> is compiled at a high level of optimization. We see it when
> we compile with gcc version 2.8.1 at -O2, and only on sparc/solaris.
> However, there is no reason why another aggressive optimizer
> on another platform would not trigger it.

Of course. I'm just astonished that we have not seen this before. The
setmp/longjmp trick to save registers had worked for quite a long time
now, and it is weird that it does not work anymore. I have tried to
find what is said in various documentations for the work that must
done of setjmp and the only thing tht is sais is that it must save the
context of its caller. For me this context should contain the
register, but it is true that it is not clearly said (I have not
available the ANSI doc, since I'm at home). Anyway, if some system
don't save register in a jmpbuf, it is clear that we have to find a
way to do so.

>
> BTW, this was found after several days of excellent detective
> work by my colleague Chi-Hua Chen.
>
I suppose that it should not have been easy. These kind of bug
tracking is very hard to find.

> In that function there is an array of string-pointers (char **argv)
> which is used as the argument to the Tcl function invocation.
> This is created by iterating through the arguments and calling
> STk_convert_for_Tcl. A side effect of doing this is to create
> conv_res, which is a STk vector which contains SCM string values
> that point to the same strings as those in argv.
> The comment in that function indicates that conv_res is used to
> avoid GC problems. However, it isn't quite right.
> What happens is that conv_res gets collected and because it
> contains pointers to the strings in argv, these strings are
> getting freed, and argv is then invalid.
>
> When this function is optimized, conv_res gets placed in a register.
> Normally this will protect it from being collected. However, with
> aggressive optimization, because the value of conv_res is not used
> afterwards, that register is then re-used for something else.
> Then, when the GC is invoked sometime in the call
> (*W->fct)(W->ptr, STk_main_interp, argc, argv);
> the value that was in conv_res is then collected and
> all hell breaks loose.
>
Argh. Yes of course.

> I see two ways of fixing this. However each has its disadvantages,
> and I am not sure if there are other places in the code where the
> same problem might show up.
>
> Fix 1 is to trick the optimizer into keeping the value conv_res
> in a register. This can be done by passing it to a dummy
> procedure. For example:
>
> void dummy(SCM v) { }; /* Should be in a separate compilation */
>
> tkres = (*W->fct)(W->ptr, STk_main_interp, argc, argv);
> dummy(conv_res); /* This references forces the optimizer to NOT
> discard the value until afterwards. */
>

Not very pretty but should work in effect

> Fix 2 is to explicitly protect conv_res from being garbage collected
> using STk_gc_protect.
>
We can declare it static and do the gc_global at the first
STk_convert_for_Tcl call. I prefer this solution..

> The trouble with both fixes is that we don't know where else they might
> need to be applied. I am not sure I know how to characterise
> exactly where such a situation might arise. I know that it is
> at least the following:
>
> 1. A SCM value is created such that it can be put in a register, and
> 2. it references a dynamically created structure (such as a string), and
> 3. that reference is copied elsewhere, and
> 4. the SCM value becomes a candidate for collection before the value
> created in 3 is dereferenced.

I think that this is a correct characterization of the problem. And it
is the only cases that cause problems.

> Please help us understand which of the above is the better solution
> (or if there is any other way), and also how we should go about finding
> other such places.

Finding places where this problemes are possible are quite difficult,
but greping for must_malloc near NEWCELL should help. I have tried to
avoid these interdependencies as far as possible (see append2 for
instances), but I was assuming that registers were correctly marked on
a GC and it seems to be no more true.


                -- Erick
Received on Sun Apr 16 2000 - 23:19:30 CEST

This archive was generated by hypermail 2.3.0 : Mon Jul 21 2014 - 19:38:59 CEST