[ts-gen] MacOS Shim problems
Richard Pruss
boadie at gmail.com
Thu May 21 05:18:01 EDT 2009
Had a little bit of time this evening and I made some progress with
poking at the hang with gdb.
It is running around a loop in leaf.c at 180
"
for(nat i(mid(a, b)); a<b && !p; i = mid(a, b)) {
switch((x = d.compare(t[i]))) {
"
Not really clear what this thing does but here are some values at the time
i 30551
a 30451
b 30651
x -8
i does not increment.
I uncommented the debug statment but it did not work so I swaped it
out for a print statement I could understand and it pretty much just
loops.
Values are i, a, b, x
Open: 30551 30451 30651 -8 25667636
Open: 30551 30451 30651 -8 25667636
Open: 30551 30451 30651 -8 25667636
Open: 30551 30451 30651 -8 25667636
Hope that provides a hint,
Ric
On Sun, May 17, 2009 at 7:44 PM, Richard Pruss <boadie at gmail.com> wrote:
> Bill,
>
> Thanks for these comprehensive e-mails, your style of clearly writing down
> your thinking is great because it lets one understand with a possible
> problem could be.
>
> After reading both I took a simple step of commenting the v6 entries from
> /etc/hosts (two v6 addresses (::1: and Fe80::1:) which are after the v4
> address but still seem to be tried first).
> That seem to fixed it and both 127.0.0.1 and localhost now worked well and
> exs/test.rb seemed work well, unfortunately it was an illusion the two
> problems turned out to be separate
> and later I started to run into the hang and again, but more on that later.
>
> Feeling very happy with what I thought was my now working shim I went on to
> try your patch.
>
> v4 environment only worked.
> 2 v6 addresses configured in host file still failed with the same error no
> 61.
> 2 v6 configured configured in host file but removed from the interface
> failed with a new error
> Connect: Can't assign requested address
> errno: 49 port:7496
>
> So I guessed it was still trying the v6 address, but I saw you fprintf's in
> net::Descriptor so I uncommented them and added
> some debugs for the address family.
>
> .
> Open: 5 1 30 30
> Open: 5 1 30 30 49
> Open: 5 1 30 30 49
>
> So I found that you are not using the iterator in the for function for the
> connect and open calls, you are using a instead of p.
> If you put the p in the address walk works and it connects on v4.
> Open: 4 30 30 1
> Open: 4 1 30 30 61
> Open: 4 1 2 2 61
>
> Here is the patch.
> -- ./lib/inet.c 2009-05-17 19:09:19.000000000 +1000
> +++ ../shim-090428-1/lib/inet.c 2009-03-12 01:42:42.000000000 +1100
> @@ -35,8 +35,8 @@
> Ai_ptr a(get_addr_info(h, z));
>
> for (Ai_0 p(a); p && !c; p=p->ai_next)
> - c |= fd.open(*p)
> - && connect(*p);
> + c |= fd.open(*a)
> + && connect(*a);
>
> freeaddrinfo(a);
>
> @@ -51,7 +51,7 @@
> struct ::addrinfo filter, *a;
>
> memset(&filter, 0, sizeof(filter));
> - filter.ai_flags = AI_CANONNAME | AI_ADDRCONFIG,
> + filter.ai_flags = AI_CANONNAME,
> filter.ai_family = AF_UNSPEC,
> filter.ai_socktype = SOCK_STREAM;
>
> @@ -72,17 +72,15 @@
> bool net::Descriptor:: open( Ai_c a) const
> {
> int d;
> -struct ::sockaddr *sa;
>
> if ((state == Closed)
> && (d = socket(a.ai_family,
> a.ai_socktype,
> a.ai_protocol)) >= 0)
> state = Open, fd = d;
> - sa = a.ai_addr;
>
> - if (errno) fprintf(stderr, "Open: %3d %3d %3d %3d %3d\n", fd, state,
> a.ai_family, sa->sa_family, errno); else
> - fprintf(stderr, "Open: %3d %3d %3d %3d \n", fd, a.ai_family,
> sa->sa_family, state);
> +// if (errno) fprintf(stderr, "Open: %3d %3d %3d\n", fd, state, errno);
> else
> +// fprintf(stderr, "Open: %3d %3d \n", fd, state);
>
> return state;
> }
>
> Now during the process it had started to hang while running correctly, my
> heart sank, I rebooted, went back to
> an host file with no ipv6, restored the version with no diffs etc but to no
> avail hang for everything. The connect and hang are very much independent
> issues.
>
> Possibly the output of the [ricpruss at Quady:~/dev/shim-090428/log] more
> cmdinput.txt
> #!./shim -f
>
> # Trigger the ReqAccountData request.
>
> cmdinput.txt (END)
>
> triggers an idea for you. It seems to come up through the account query and
> stop on first real command.
> Same for tick and other and exs/test.rb does not do it's wonderful song and
> dance anymore.
>
> #!./shim -f
>
> # Trigger the ReqMktData and CancelMktData requests for AMAT and YM,
> # using each of the pos
>
> Looks like something intermittent is hanging command input,
> Ric
>
> On 16/05/2009, at 1:20 PM, Bill Pippin wrote:
>
>> Ric,
>>
>> About your problems getting the shim running on a Mac, in the context
>> of this post, please grant for purposes of discussion that there is
>> some interaction between the issues you've seen with IPv6 network
>> configuration, and the apparent hang where the shim seems to ignore
>> command input. That is, we take as given that your reports about the
>> following are all related, and not distinct problems:
>>
>> 1. There are problems with network configuration for IPv6.
>> 2. ... once the shim starts
>> up properly, the shim seems to hang. More precisely, you are
>> not able to enter commands, or they are being ignored.
>>
>> I'm not sure how this could be so, but it's foolish to ignore a
>> known problem when debugging an unknown one, so here goes.
>>
>> Due to my level of ignorance for your actual situation --- I have no
>> equivalent plaform --- I'm shooting in the dark, so please keep a
>> number of possibilities in mind:
>>
>> 1. You might have dns or resolver issues, and in particular
>> timeouts are occurring, so that your patience would become
>> critical; and/or:
>>
>> 2. The shim and the IB tws have different ideas about whether
>> to use IPv4/IPv6-agnostic code, and perhaps changes to
>> the shim will make it compatable with the behavior of the
>> IB tws; and/or:
>>
>> 3. Differences between the linux and OS x implementations of
>> getaddrinfo() are tripping us up here, and hopefully some
>> possibly subtle change in the way the procedure is called,
>> or its results are used, will fix the problem.
>>
>> Yes, I know that none of the above have any clear-cut link to an
>> apparent failure by the shim to hear your command inputs; but please
>> bear with me, even though I know I'm considering what look like
>> long shots.
>>
>> First, the IB tws gui has long required IP addresses, not host
>> names, when drilling down via Configure->Api->All Api Settings ...
>> to the trusted IP address create/edit dialog. This means that
>> you need to commit to an IPv4 or IPv6 form for your address
>> input. You should know that I have not yet been able to get
>> the shim to successfully connect to the IB tws when the only
>> trusted address is ::ffff:127.0.0.1 . This might be because IPv6
>> is not currently setup on my machine, which I believe to be the
>> case, but any in case you're probably wise to enter 127.0.0.1 as
>> one of the trusted addresses in the gui.
>>
>> By the way, for the feedhost value in your .shimrc file you
>> should *not* give an actual hostname; stick with the numbers above
>> or localhost . You don't want the connection to a local IB tws
>> to go out through your external interface, due to the security
>> issues involved.
>>
>> If the trusted IP address values are incomplete, then if the shim
>> tries to connect using an interface bound to a number not in that
>> list, you should see the "Accept incoming connection attempt"
>> dialog box, so you should already be aware if this is a problem, and
>> still it doesn't hurt to check. In addition, if you are feeling
>> adventurous, you can enter both IPv4 and IPv6 numbers, as separate
>> entries in the trusted IP number list, and in particular you might
>> consider adding both of the values for localhost, 127.0.0.1
>> and ::ffff:127.0.0.1 .
>>
>> If at this point you still have problems, then for the rest,
>> I'm going to ask you to go into the shim's code.
>>
>> First, in lib/inet.c , there is a call to getaddrinfo(), and
>> a loop to traverse the list that is returned by that procedure,
>> with the shim stopping at the first entry to which it can make
>> a successful connect. Since there is anecdotal evidence that
>> OS X may not put the entries in the best possible order, perhaps
>> the shim is committing to a poor choice here, and we'd like to
>> prune that list.
>>
>> There are two ways to do this, one right and one wrong, and
>> since it's your machine, you should feel free to do whatever
>> you like here. Be advised, however, that the second of the
>> two changes I'm about to describe will *never* be added to our
>> publicly released code. The first, however, I plan to add
>> to the next release, and so below, a patch to do that for
>> the tarball you've already obtained:
>>
>> *** old/inet.c Wed Mar 11 10:42:42 2009
>> --- lib/inet.c Fri May 15 21:10:32 2009
>> ***************
>> *** 51,57 ****
>> struct ::addrinfo filter, *a;
>>
>> memset(&filter, 0, sizeof(filter));
>> ! filter.ai_flags = AI_CANONNAME,
>> filter.ai_family = AF_UNSPEC,
>> filter.ai_socktype = SOCK_STREAM;
>>
>> --- 51,57 ----
>> struct ::addrinfo filter, *a;
>>
>> memset(&filter, 0, sizeof(filter));
>> ! filter.ai_flags = AI_CANONNAME | AI_ADDRCONFIG,
>> filter.ai_family = AF_UNSPEC,
>> filter.ai_socktype = SOCK_STREAM;
>>
>> The AI_ADDRCONFIG flag tells getaddrinfo() to skip possible
>> addresses that probably wouldn't work anyway. It should
>> have already been included in the code, and wasn't since
>> I didn't know of it when I added IPv6 compatability a couple
>> of years ago.
>>
>> If the above change doesn't help, you can also try setting the
>> address family from unspecified to AF_INET . This creates
>> a hardcoded dependency on IPv4 addressing, and definitely
>> creates possible future compatability problems, and so you
>> didn't read me advising this here ;<) If you take this
>> approach, change the line that is the same as the first below
>> to appear instead like the second:
>>
>> filter.ai_family = AF_UNSPEC,
>> filter.ai_family = AF_INET,
>>
>> If, after trying each of the above two changes in turn, you're still
>> having problems, maybe IPv6 configuration isn't a problem after all.
>> In what follows, I'll just describe areas of the code where you can
>> insert printf diagnostics if you're curious and adventurous. If you
>> don't know C at all, you should probably stop reading here.
>>
>> The points of interest include: the main loop, in src/shim.c, to
>> verify that the shim is twiddling its thumbs ; various points in
>> src/wait.c , especially Intake::read(), where you can check how many
>> characters if any are being read from the command input, and the
>> upstream socket input; and elsewhere in the network socket
>> initialization code, that is in lib/inet.c , net::Socket::connect(),
>> the call above to Socket::get_addr_info(), and its call to system
>> getaddrinfo() . In many cases there are printf diagnostics already
>> written, but commented out, and you can simply remove the comment
>> characters and recompile. You will need some idea of what you're
>> doing, however to make sense of the output.
>>
>> You may find it of interest to determine:
>>
>> * which entry in the getaddrinfo results that the shim is taking;
>>
>> * if and when any input is being read in via Intake::read(); and
>>
>> * if indeed the shim is just cycling through it's main loop
>> without doing anything.
>>
>> The second probe point above is particularly of interest for the
>> symptoms you describe, although it won't provide any answers, just
>> problem details and maybe more questions, which is why I haven't
>> mentioned it until now.
>>
>> Command input and socket reads are mediated via a call to the system
>> select() call . The key point of interest here, given a shim started
>> from the command line with no piped-in input, so that it should be
>> reading from your keyboard, is whether the shim wakes up when you
>> press the return key. Stdin from the keyboard is line buffered, and
>> so each return should trigger a return from the select(), and a copy
>> of the text into the shim's command buffer, and a printf message
>> if you've uncommented the related statement in wait.c: Intake::read().
>> If command input works even in the most limited fashion, the synchrony
>> will be absolutely unmistakable: you hit return, shim prints message,
>> you hit return again, another message, and so on.
>>
>> If you uncomment the trace statement in Intake::read() of src/wait.c,
>> recompile, and run as in the above para, and the shim doesn't print
>> character counts to the stderr that reflect to some degree your
>> keyboard input, then the shim is truly ignoring its command input,
>> and I don't know why .
>>
>> If the above looks like a long shot, and not very likely to reveal the
>> problem, it is; and if it seems like too much trouble, don't worry
>> about it. At this point, however, I'm out of suggestions; beyond what
>> I've mentioned this far, all I can do is wait until I have a Mac to
>> work on.
>>
>> As always, furthur diagnostics and reports from your side are welcome;
>> please let us know what's happening.
>>
>> Thanks,
>>
>> Bill
>>
>
>
More information about the ts-general
mailing list