NWS threatens to "throttle" access to wx data to save bandwidth

Pilawt

Final Approach
Joined
Sep 19, 2005
Messages
9,474
Location
Santa Rosita State Park, under the big 'W'
Display Name

Display name:
Pilawt
I usually go direct to NWS websites for information, and have noticed more and more disruptions to their websites. From the article, the biggest impact will be to third party websites which may flood NWS servers with multiple simultaneous requests due to their user demand.
 
Wonder how this will affect WeatherSpork
 
Not being any kind of technical expert, but 60 connections seems low....couldn't they just mirror or whatever magic box technology is available and distribute it out that way? Put 2 more computers out there and now down to only 20 connections per computer. Hell charge them for the right to get the data if a commercial operation. Gov't can't give everything away for free.
 
60 connections per minute... times how many tens or hundreds of thousands of users?
 
Reddit’s /r/datahoader cried out in pain and then was silenced. LOL

(As nutty as it sounds. There are private individuals who download ALL data from their chosen sources. Their home electric bills are high.)

Of course the government could actually implement file transfer protocols like BitTorrent that actually work to offload their servers... but that won’t ever happen. Been around a long time now...

There’s also content delivery networks that specialize in such things. Not exactly a design surprise that most orgs use them for both throttling real users and also malicious DDoS attacks...

Which is as easily what triggered the announcement as real downloads...

Never trust the Internet. Rule number one.
 
Certain people in government hav long wanted to privatize NWS. The same way as some want to privatize ATC. The thinking is that weather is used to support private interests. Throttling the data or defunding it is not the way to accomplish that kind of goal.

This is something the various aviation membership organizations should be all over. Right, AOPA?
 
Certain people in government hav long wanted to privatize NWS. The same way as some want to privatize ATC. The thinking is that weather is used to support private interests. Throttling the data or defunding it is not the way to accomplish that kind of goal.

This is something the various aviation membership organizations should be all over. Right, AOPA?

To be fair I don’t think government nor private businesses really paid attention to the fact that some houses have internet connection speeds faster than used to serve entire data centers not very long ago.

We all should have throttled ALL connections from the very start. If the user wants more data, they pay more. Something has to pay the bandwidth and storage bill.

When even Google is now slapping storage limits on even their commercial products, “unlimited” was never realistic on anything.

We all got away with it because residential broadband is a couple decades behind here and all the upstream links did the throttling via over-subscription.

Most city houses here are starting to see close to 1Gb/sec coverage. Some have 2.

I built data centers with dual 100Mb/sec pipes that were never full... with 1000 corporate websites inside each one... didn’t take long to make that look quaint.

One of our current platforms now has over 6 million registered users. Multiple load balanced servers, multiple sites, automatic scaling for load hits, etc etc etc.

And we’re probably smaller by a couple orders of magnitude on NWS data demand...
 
That's 60 connections per user per minute. Multiply by a few million users.

Of course besides the bandwidth, most folks realize their browser makes 10-20 connections simultaneously per website, right?

LOL.

I assume NWS is talking file transfer protocols that can’t multi-stream and not about their own website. Haha.

“Connections” is one way to meter but usually it’s stupid. Problem is they know many users come from behind a single NATed public IP address at large orgs, so they can’t throttle by IP and have to use something more intelligent.

And as soon as they do that (like looking at referral details and such) the bad guys simply tell their software to lie.

He who dies with the most erlangs, wins... as we always said in telecom. :)
 
Of course besides the bandwidth, most folks realize their browser makes 10-20 connections simultaneously per website, right?

LOL.

I assume NWS is talking file transfer protocols that can’t multi-stream and not about their own website. Haha.

“Connections” is one way to meter but usually it’s stupid. Problem is they know many users come from behind a single NATed public IP address at large orgs, so they can’t throttle by IP and have to use something more intelligent.

And as soon as they do that (like looking at referral details and such) the bad guys simply tell their software to lie.

He who dies with the most erlangs, wins... as we always said in telecom. :)
They're throttling only certain resources according to the article, so probably not the images, JavaScript, etc. that @denverpilot is talking about.

The target isn't us human users: it's bots and scrapers. We had to do the same thing for a tool I developed with the UN — it was getting hit so much by bots and scrapers from a small number of IP addresses that we had to limit the number of connections per minute per IP. The change should be invisible to humans using the site. but if you have a scraper or bot hitting it 1,000×/minute, you'll see the difference.

FWIW, this is absolutely standard, modern sysadmin/firewall stuff. I know the government isn't always great at IT, but in this particular case, they're not guilty of any of the ignorance people in this thread are accusing them of. Any website that contains popular data (weather, census data, COVID-19 caseloads, aid numbers, etc) can get hammered into oblivion by a few badly-designed scripts scraping them over and over.
 
If you need more than 60 connections per minute you're a commercial user making money on the taxpayer's dime. I don't see any reason it should be higher.

Yea, god forbid the government provide a service that private industry gets to fix with an app layer so that it actually works and doesn't need to be deciphered with a secret decoder ring. . :D
 
Yea, god forbid the government provide a service that private industry gets to fix with an app layer so that it actually works and doesn't need to be deciphered with a secret decoder ring. . :D
It's not so much that — it's just that a lot of people writing those scripts don't know what they're doing, and they end up launching unintentional denial-of-service attacks. Your government in the U.S. is, as far as I know, the best in the world at making free data available to the public (a compliment about the U.S. from a citizen of a foreign country—enjoy it!), but any web site can come down under bad behaviour. Sometimes the culprit is simply a university student doing research, or someone putting up a hobby site, with just enough PHP or Python experience to be dangerous—it's incompetence much more often than malice.

I've been a user of U.S. government data via my free/hobby ourairports.com website since about 2007. I try to be a good user, caching data as much as possible, and putting time limits before I download the same thing twice. They don't seem to mind my doing that. But there are lots of people who don't exercise those basic good manners with free data and open APIs, so any popular free data site ends up having to implement throttling eventually (OpenStreetMap's Nominatim geocoder, for example, also allows a maximum of 60 connections per IP per minute).
 
Sometimes the culprit is simply a university student doing research, or someone putting up a hobby site, with just enough PHP or Python experience to be dangerous—it's incompetence much more often than malice.

I’m sure you know this but the days of a single hobbyist affecting a properly built large scale website were over ten or more years ago.

Many thousands of hobbyists, perhaps.

Modem CDNs (Content Delivery Networks) and server auto-scaling can handle DDoS attacks by nation-states these days.

I suspect NWS is just ten years behind the curve of modem tech usage like most government websites. They probably have some small sub-set of the normal tools deployed and are adding one everyone else did a decade ago now.

Like you said, no big deal. Number of connections isn’t how CDNs do it now — it’s smarter and more distributed than that and a crap-load more complex.

They’ll learn connections doesn’t work like we did. I can get all the IP addresses I want and get around that limit even as a home gamer, let alone how bad I could pound them from work resources.

Anybody actually making money off that data who needs it already coded around the connection problem within 24 hrs of them announcing it and bought enough public IPs that they’ll never know, for less than the cost of a steak dinner out. LOL.

With our work tools I could automate creating and destroying as many machines on as many random public IPs as I needed, each one grabbing whatever they could, and a new ten or twenty globally every hour, if I absolutely had to have their data. Just scaling what you’re doing to be “nice” to that one website.

Easy peasy to beat anything that’s only counting connections. Really cheap too. Yay “cloud”. LOL.

Anybody with $10K to blow can take down any website that doesn’t live behind a well run CDN and load balancers these days. Even then depending on design, the site will struggle.

Maybe it’ll help that they’ll probably require user auth for all of it so the coder would have to code the downloader to apply for a pile of new user accounts, but that can also be automated. Captchas don’t all work anymore. There’s code to get around those, too. Ha.

One of our customers mandated we add captchas. We contracted it to Google, integrated it, and said if that ain’t good enough, we certainly won’t do better. Have fun with that. They’re mostly useless now. But they do require extra coding and CPU cycles so it raises the cost of making thousands of fake accounts. Still not that hard though.
 
I’m sure you know this but the days of a single hobbyist affecting a properly built large scale website were over ten or more years ago.

Many thousands of hobbyists, perhaps.
Yes, and that's exactly what a popular site like the NWS gets, hence the per-IP connection limit.

As for CDNs, sure, those can help, especially with static or slow-changing resources like images, documents, JS, CSS, etc. With dynamic data, you could use a push strategy, pregenerating as many permutations of your pages as possible and pushing them to the CDNs, but in the end, you run into the same kind of problem Google, Twitter, Facebook, etc. ran into, and you have to start building your own giant data centres with clever tricks like BigTable, Hadoop, etc. And then you run into a couple of problems:

1. You don't have investors willing to pour in hundreds of millions to build those data centres (taxpayers can be a mite stingy compared to VCs :) ).

2. You can't easily attract with top talent needed to build them with civil-service salaries and no possibility of stock options or similar.

So yes, the American taxpayers could pour a lot of money into the NWS or other web properties to make them performant — enough, that is, to hire talent away from the Googles and Facebooks of the world (or even the small startups that hope to become a Google or Facebook and make their early employees into 100 millionaires) — or, alternatively, they could just throttle their services to a maximum of 60 connections per IP per minute, which costs a few thousand dollars (mostly around communications with the public), and has no impact on ordinary, interactive users.
 
Last edited:
Yes, and that's exactly what a popular site like the NWS gets, hence the per-IP connection limit.

Which doesn’t work, like I said. BTDT. LOL.

They’ll learn real quick. We did. Haha.

The escalation never ends.

I think we manage something like 10,000 attack attempts on a single protocol (not mentioning which) per hour, and we aren’t even anywhere near big or noticeable.

People who want the data will code a way to get the data. Computers fight computers faster than people do.

And people trying to throttle botnets assume people are driving manually. Nah. It’s code. As fast as you change your rules, they’ll change their code to beat them.
 
Which doesn’t work, like I said. BTDT. LOL.

They’ll learn real quick. We did. Haha.

The escalation never ends.

I think we manage something like 10,000 attack attempts on a single protocol (not mentioning which) per hour, and we aren’t even anywhere near big or noticeable.
You're talking about deliberate DDoS attacks, which is a whole different kettle of fish. Agreed that per-IP limits don't affect those at all.
 
That's 60 connections per user per minute. Multiply by a few million users.

Depending on what protocols their APIs are based on, still not a huge deal even if there are 5 million users concurrently using those API 24/7. Needs planning and elasticity and I would assume NWS have enough money to do it
 
Depending on what protocols their APIs are based on, still not a huge deal even if there are 5 million users concurrently using those API 24/7. Needs planning and elasticity and I would assume NWS have enough money to do it
You might be assuming wrong, based on my own experience doing IT consulting with governments and UN agencies on and off for the last 22 years. There are certainly examples of huge wasteful projects, but they're usually things associated with new government announcements that got lots of newspaper headlines. Regular operating budgets for existing websites, even very important ones, can tend to be stingy.

Remember, the revenue model isn't the same. When a private-sector company spends more money on operations, it's because it expects to make more money from users. More users or visits for public sector website are just an extra cost, with no additional revenue attached to them, because the budget allocation was already fixed for the fiscal year.
 
You're talking about deliberate DDoS attacks, which is a whole different kettle of fish. Agreed that per-IP limits don't affect those at all.

Kinda. Regular users are DDoS attacks these days.

Just try dealing with someone putting a TV ad out to a large State about your site without telling you. LOL.

BTDT. Got the Saturday alerts and five people monitoring scaling systems t-shirt!
 
You might be assuming wrong, based on my own experience doing IT consulting with governments and UN agencies on and off for the last 22 years. There are certainly examples of huge wasteful projects, but they're usually things associated with new government announcements that got lots of newspaper headlines. Regular operating budgets for existing websites, even very important ones, can tend to be stingy.

Remember, the revenue model isn't the same. When a private-sector company spends more money on operations, it's because it expects to make more money from users. More users or visits for public sector website are just an extra cost, with no additional revenue attached to them, because the budget allocation was already fixed for the fiscal year.

Agree. Never saw a well funded government ops group.

Stood next to DUAT(S) when it was still in a mainframe at GTE in the late 90s.

GTE didn’t spend a dime more than mandated in the requirements on that thing.

Piece of garbage by then. And it ran many years afterward. Expensive IBM maintenance.

On the other hand their insistence on COTS software and hardware on the very nice system we made for ATC — who’s use case couldn’t be accomplished with COTS equipment — significantly raised the price tag when it. Oils have been much lower. They assumed COTS meant someone else could work on it or even understand it. Not a chance.

Guess how expensive our maintenance contract was.
 
Agree. Never saw a well funded government ops group.

Stood next to DUAT(S) when it was still in a mainframe at GTE in the late 90s.

GTE didn’t spend a dime more than mandated in the requirements on that thing.

Piece of garbage by then. And it ran many years afterward. Expensive IBM maintenance.

On the other hand their insistence on COTS software and hardware on the very nice system we made for ATC — who’s use case couldn’t be accomplished with COTS equipment — significantly raised the price tag when it. Oils have been much lower. They assumed COTS meant someone else could work on it or even understand it. Not a chance.

Guess how expensive our maintenance contract was.
Agreed about COTS. Software that's really COTS -- commodity stuff with millions of installations, like word processors, email clients, RDBMs, HTTP servers, etc -- makes sense, but a lot of what's claimed as COTS is really just unstable, commercialised prototypes with a few launch customers, and that brings all the risks of custom development without the ability to customise it for your own needs, and none of the stability benefits of COTS to compensate. Government procurement specialists don't have the experience to understand that, and get duped every time.
 
On a diff note ... 60 connections per min? What are they running? 486 under someone’s desk?

My ATC facility’s IDS (information dissemination system - the computers we use to pull up approach plates, etc...) reboots and defrags on the mid shift every night. 486 processors. This isn’t Podunk Approach, either.
 
My ATC facility’s IDS (information dissemination system - the computers we use to pull up approach plates, etc...) reboots and defrags on the mid shift every night. 486 processors. This isn’t Podunk Approach, either.

be thankful, some facilities used to run a pentium 90Mhz system (and it was so poorly coded that it ran as a function of the CPU clock speed)
 
My ATC facility’s IDS (information dissemination system - the computers we use to pull up approach plates, etc...) reboots and defrags on the mid shift every night. 486 processors. This isn’t Podunk Approach, either.

Wow. Raspberry Pis have more computing power.
 
Wow. Raspberry Pis have more computing power.
On a similar note, about 5 years ago every consulting client was convinced that their project had "Big Data" to deal with (it was one of the buzzwords du jour). I'd start by saying "If it fits on my phone, it's not Big Data", then would plunk my 64 GB Android phone into the middle the table. It was rare one of them had genuine big data that wouldn't fit; usually it was just a few hundred thousand rows of traditional relational data (a million or two at the outside); a conventional SQL database, even non-clustered and running in a low-spec'd VM, will eat that for breakfast and come back for seconds and thirds, so it was silly to go looking at complicated solutions.
 
On a similar note, about 5 years ago every consulting client was convinced that their project had "Big Data" to deal with (it was one of the buzzwords du jour). I'd start by saying "If it fits on my phone, it's not Big Data", then would plunk my 64 GB Android phone into the middle the table. It was rare one of them had genuine big data that wouldn't fit; usually it was just a few hundred thousand rows of traditional relational data (a million or two at the outside); a conventional SQL database, even non-clustered and running in a low-spec'd VM, will eat that for breakfast and come back for seconds and thirds, so it was silly to go looking at complicated solutions.
Big data... ha.

My current pet ingests somewhere in the neighborhood of 150 TB per day, depending on the day. We'll very likely see 200 TB/day or more before the end of '21.

6eca6881952f8b97c1a4ff0ad6ae43f5.jpg
 
Big data... ha.

My current pet ingests somewhere in the neighborhood of 150 TB per day, depending on the day. We'll very likely see 200 TB/day or more before the end of '21.
Impressive! I had the rare privilege to have a personal guided tour of the underground CERN data centre in Geneva in 2014. They don't talk about anything smaller than petabytes, and those are just small, selective samplings of the data from the 27 km large hadron collider that runs under Geneva. ;) Personally, I'm happy to stick with "small data" (a few million or tens of millions of rows), because that's my specialty.
 
Impressive! I had the rare privilege to have a personal guided tour of the underground CERN data centre in Geneva in 2014. They don't talk about anything smaller than petabytes, and those are just small, selective samplings of the data from the 27 km large hadron collider that runs under Geneva. ;) Personally, I'm happy to stick with "small data" (a few million or tens of millions of rows), because that's my specialty.
But CERN’s stuff is probably actually important. Yours probably is too. And I’m going to stop right there.
 
But CERN’s stuff is probably actually important. Yours probably is too. And I’m going to stop right there.
We all hope our stuff is important, but it's impossible to ignore how much of the real performance innovation comes from the people keeping the porn sites online. :-/
 
Last edited:
We all hope our stuff is important, but it's impossible to ignore how much of the real performance innovation comes from the keeping the porn sites online. :-/
Most people would be shocked to learn just how much time, money, materials, effort, space, and storage is burned just to meet regulatory requirements (real and perceived) in some industries.
 
To be fair I don’t think government nor private businesses really paid attention to the fact that some houses have internet connection speeds faster than used to serve entire data centers not very long ago

I was involved in a Satellite that we sent up in the late 90's that provided internet access for users from Egypt to the Philippines. At some point it carried like 90% of the residential internet traffic from Kuwait & Saudi Arabia. (And yeah, the majority of that traffic was exactly what you expect it to be).

It was a HUGE deal. It had so much bandwidth that the owner subleased some of the DBS transponders to a satellite TV station in the end.

That Satellite in TOTAL had 840 mb/s of bandwidth - one directional. Today I have 1000 mb/s at home - bidirectional.
 
... then would plunk my 64 GB Android phone ...

LOL I hear ya. Current phone is twice that and I bought 24TB of storage for the house, I have no idea why other than it was cheap.

Joked with a co-worker if we ever need to rearrange the work NAS box I can bring my silly Synology in and back the whole thing up to it easily without deleting my stuff, we reconfigure it, and copy it back. Ha.

Only downside would be the home one only has dual 1G Ethernet. Would take a while. Ha.

We run the entire company on a quad bonded Ethernet one, but the next one will have SFP+. Hehehe.

We’re good at squeezing every penny from our budget since we understand measuring actual usage, but sometimes it wastes time and money when we argue about the specs of a new box instead of realizing the “expensive” one is a whopping $3K. LOL.

We did that the other day. Should we get the 2TB drives or 4TB for this array? Who cares... by the time that box fills (audio recordings highly compressed) we’ll just attach it to a new SAN. Ha.
 
Back
Top