Infrastructure
This page will be used for general information about our grid Infrastructure,
news, upgrade stories, patches to the software stack, network configuration and
studies etc ... Some documents containing local information are however
protected.
Gatekeeper Infrastructure
Networking issues
- Network layout for
stargrid01, February 2005.
Both stargrid05 and stargrid01 have dual NIC cards. However, the network route
from stargrid nodes to pdsfgrid5 are different and as follow:
stargrid01 to pdsfgrid5
1 anubis.s80.bnl.gov
(130.199.80.124) 0.309 ms 0.224 ms 0.237 ms
2 shu.v400.bnl.gov (130.199.136.9) 0.240 ms 0.248 ms 0.243 ms
3 amon.bnl.gov (130.199.3.24) 0.744 ms 0.625 ms 0.628 ms
4 esbnl-bnl.es.net (198.124.216.113) 0.727 ms 0.745 ms 0.618 ms
...
stargrid04 to pdsfgrid5
1 sw13r (130.199.6.124)
0.240 ms 0.172 ms 0.157 ms
2 isis.s19.bnl.gov (130.199.19.1) 0.551 ms 0.555 ms 0.538 ms
3 shu.v500.bnl.gov (130.199.136.25) 0.570 ms 0.539 ms 0.530 ms
4 amon.bnl.gov (130.199.3.24) 1.244 ms 0.946 ms 0.917 ms
5 esbnl-bnl.es.net (198.124.216.113) 1.003 ms 0.977 ms 0.992 ms
...
Note the main difference of using anubis, shu,
... for stargrid01 while stargrid04 uses
sw13r, isis, shu, ... .
- Traffic measurements, February 2005
Raw results can be found on Eric Hjort's
Data
Transfer Rate page. The above are mix of my own speculation and (mis)
understanding.
Tabulated, the results from pdsfgrid5 to
stargrid nodes 01, 03 and 04 can be represented as
follow
- TCP parameters
Further inspections of the TCP parameters showed discrepancies as follow
- tcp_timestamps is disabled on
stargrid04. This parameter is believed to have
influence on TCP windows scaling (see
RCF 1323 and RTO/RTT measurements) and possibly cause the network transfer
collapse observed in the pdsfgrid5 to
stargrid04 data transfer rate (
congestion collapse
described in
RCF 896 ).
- Maximum window size (transmit/receive) is twice the amount on
stargrid03 comparing to other nodes
- The default TCP receive/transmit window is 65535 on
stargrid03, 1048576 otherwise (
TCP tune )
- txqueuelen (value can be displayed using
ifconfig) is 100 on
stargrid03, 1000 otherwise. A value of 100 prevents from window scaling
to ever have a chance to work (see for example ). Effect of
txqueuelen on data transfer is explained in
this DataTag paper.