AOPA forums down?

Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.
 
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.
Depends on what you're doing and what you think could go wrong. The difference between 52 minutes and 5 minutes might require another datacenter, more development, and more processes. It's not like processes are free either. It will certainly require a hell of a lot more testing and testing is expensive no matter how you do it.
 
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

It's not just mindset.

The "need" is defined by the mission - a forum or non-pay entertainment site might tolerate 99% reliability, a life-critical application (911 center) would want several 9s of reliability.

After determining the need, you analyze the environment and determine what you need to achieve the reliability goal. A site located where it can get grid power from two substations with a generator backup might need little more, where a mountaintop primary site might require an full backup site.

Compare cost to budget. Re-evaluate system needs & siting options. Rinse. Lather. Repeat.
 
Did you just accuse me of impropriety, and discourtesy? :yikes::yesnod:

I should be offended. Nah - I'll have another juice box, and maybe go for a morning swim. :D

Not a chance, Doc; if I thought you were out of line, you'd hear about it in a PM, or on the phone!

My comment was a broader one, about AOPA, and organization whose mission I fully support, but whose methods (and occasional tone-deafness) sometimes drive me bananas.
 
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

There are some situations where 5 min/yr down time is unacceptable.
 
There are some situations where 5 min/yr down time is unacceptable.

I agree completely.

I think there are also some (many?) situations where more down time is actually acceptable than the customer thinks.

Every product at my company is critical and should have 99.999% availability... in the mind of the product's owner.

This is why we have an independent reliability assurance team which evaluates each product and classifies them as 0-3 on a time criticality scale. The ones that are a zero don't get as many resources as the ones that are a 3.

"Resources" is a word that covers a lot of stuff... 24-7-365 oncall support ($$$), service contracts with faster replacement time for the systems, more development oversight, testing oversight, change review, and yes, more systems or deployment in multiple data centers.

All I am trying to say is that Reliability is a holistic thing, whereas many people think you can just add more of ingredient 'x' (computer servers, power substations, engines on an airplane, cops on street corners, or whatever). In reality, it takes judicious application of ingredient 'x', along with real risk management, and a reliability mindset.

The hamster wheels at FDK didn't stop turning because of an equipment failure, their problems are a systemic risk management failure.
 
All I am trying to say is that Reliability is a holistic thing, whereas many people think you can just add more of ingredient 'x' (computer servers, power substations, engines on an airplane, cops on street corners, or whatever). In reality, it takes judicious application of ingredient 'x', along with real risk management, and a reliability mindset.

The hamster wheels at FDK didn't stop turning because of an equipment failure, their problems are a systemic risk management failure.

I can groc this. I've seen datacenters where the suits come through and they talk all this RAS stuff as if they had any idea what the nuts and bolts actually do. Burning more watts is often not the right solution, and there is even an element of chaos involved in the deal as well. One of the big customers I work for was going to put their dark site in OK city cause they were gonna get some big tax breaks and incentives. The recent tornadoes put the kibosh on that plan.
 
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

In my case, anything after 10 minutes of downtime may (that's a big may) translate to 6 zeros worth of lost money.

That's a big, big deal. We take 99.99% very seriously.
 
Hey folks,

I'm trying to make a rare trip to the red board, and I've discovered that in at least three different browsers, I'm unable to get in - There's a redirect loop happening. Is anyone else experiencing this?

And the problem with the Red Board being down is.....?
 
AOPA forums workaround kit mailer:

1. Locally procure or locate large empty tin can, juice can works best.
2. Locally procure or locate about 200 feet of string or twine.
3. Remove one end of can, and poke small hole in exact center of other end.
4. Fed string end through hole in can.
5. ......
 
AOPA forums workaround kit mailer:

1. Locally procure or locate large empty tin can, juice can works best.
2. Locally procure or locate about 200 feet of string or twine.
3. Remove one end of can, and poke small hole in exact center of other end.
4. Fed string end through hole in can.
5. ......

Ignore the requests from users to install plugin that allows use of TapaCan for mobile devices.

2 weeks later, declare that everyone must upgrade to TinCan 2.0...
 
In my case, anything after 10 minutes of downtime may (that's a big may) translate to 6 zeros worth of lost money.

That's a big, big deal. We take 99.99% very seriously.

One place I used to work, we quantified the downtime on our primary system at $100k per hour using hard numbers.

At that point, we could show that "Sir, yes, it'll cost us $200k for such-and-such, but it will reduce our downtime by 3 hours over the lifetime of the system. Giving us an internal rate of return of 9%. Please sign this capital authorization now."
 
After a week of decent performance, it's down again. :(
 
Been a dry week out on the AOPA drinking game.
 
Back
Top