Most people don't even know of the existence of Connection Storms. That includes the programmers who write the P2P applications that unleash them, as well as the programmers who write the router software that crashes when faced with Connection Storms, rather than containing them. Tomato's "Advanced > Conntrack/Netfilter" WebGUI page displays Current Connection Counts, by Connection State, ephemerally (and incompletely!). Currently it is extremely costly to obtain these counts from shell scripts, as this involves reading and processing in detail the entire Conntrack Table. I would like to issue an invitation to capable programmers to make the Current Connection Counts available in /proc pseudofiles, in order to enable detection, measurement, and control of Connection Storms, by sys admins, with shell scripts. # # # I would like to direct the reader's attention to the attachement. We see that near the start of the chart, at 19:44:26, the number of connections jumped by 150 or more in just 5 seconds, to 381. During the next 10 seconds it continued to rise slowly to 437, levelled off for 10 seconds, and then dropped by 300, 5 seconds later. What had happened? Here is a record of these connections, by idle time and connection state: Code: HH:MM:SS 0s 1s 2s 3s >3s Total State -180 E1200 S120 R60 F120 T120 C10 CW60 LA30 L120 UU/-30 UA180 19:44:30 0 2 4 0 374 380 U-:316 (DNSe):159 (DNSi):158 E:40 T:21 UA:1 Cl:1 CW:1 19:44:38 0 4 1 19 414 438 U-:371 (DNSe):188 (DNSi):184 E:40 T:25 UA:1 Cl:1 19:44:47 0 3 2 2 419 426 U-:368 (DNSe):186 (DNSi):183 E:38 T:19 UA:1 19:44:53 0 1 0 1 124 126 U-:95 (DNSe):54 (DNSi):41 E:23 T:8 19:44:59 0 4 6 1 108 119 U-:60 E:37 (DNSi):31 (DNSe):30 T:21 UA:1 19:45:06 0 4 0 0 56 60 E:37 T:20 U-:2 (DNSi):2 UA:1 (DNSe):1 Allmost all the connections are idle (i.e. defunct) and waiting to time-out (i.e. be removed from the Conntrack Table). The entire 300+ connection surge is UDP. Unclassified UDP -- the kind that the abovementioned GUI page forgets to count. These are DNS queries. Half are to the Router, the other half are their relays, from router to NS. 30 seconds later they are gone -- timed out. This is just one kind of Connection Storm. # # # Other kinds of Connection Storms start up numerous SYN Sent or UDP connections that may or may not convert into short-lived Established connections, and then immediately wait to time out in Time-Wait state or UDP assured or UDP "stateless". The system under study uses Victek's default timeouts, i.e.: -180 E1200 S120 R60 F120 T120 C10 CW60 LA30 L120 UU/-30 UA180 I invite those who understand the connection mechanisms in this context to explain why the timeouts for the States that sustain Connection Storms should not be reduced greatly, namely thus: Code: SS: 120s -> 5s TW: 120s -> 5s UU: 30s -> 5s (== U-) UA: 180s -> 10s Once we understand that a Connection Storm involves sharp surges of hundreds, if not thousands of "connections" that can crash the router, that typically last for less than a minute, and are made up almost exclusively of defunct "connections" that will time out in 2 or 3 minutes, it makes sense to get rid of these defunct "connections" before they can build up a storm that crashes the router. Thus, if the connections are created at a rate of 150 per second, and expire in 120 seconds, we risk a storm of of a magnitude of 18,000. But if we reduce the timeouts (of those already defunct "connections") to 5s, the maximum storm size is 750. The problem is not (as the conventionally-wise coders have been assuming), that Established TCP connections linger for 5 days and slowly fill up the Conntrack Table. The problem is that some P2P (and other reckless) applications will attempt to create thousands of connections instantly, and while most of those connections never materialize, they create a huge connection storm during the attempt. What I have suggested in this message could significantly alleviate this problem that may be the most serious one affecting Tomato users. I am not calling for developers to feature-solve the problem (which they can not be expected to understand), but for a capable programmer to make available those Current Connection Counts, to enable sys admins to detect, measure, understand, and control these Connection Storms. Edit: 1. It would be ideal if, in addition to the Total Current Connection Count, the Current Connection Counts-by-State would be available, just as the Timeouts-by-State are available. 2. Ideally, such counts should include the count for "UDP (None) -- U-", in addition to the counts for "UDP Assured -- UA" and "UDP Unresponsive -- UU". "UDP (None)" is missing from the "Advanced > Conntrack/Netfilter" WebGUI page. (As a result, the counts on that page fail to add up, when there are U- connections!) (Even though the vast majority of UDP connections are U-, there is no separate Timeout for U-. the Timeout for U- is implicitly inherited from UU. This confusion (and omission) may be rooted in the challenge to "Connection-Track" the "Connection-less" UDP Flows.