AOPA forums down?

livitup · Jul 24, 2013

Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

jesse · Jul 24, 2013

livitup said:
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

Depends on what you're doing and what you think could go wrong. The difference between 52 minutes and 5 minutes might require another datacenter, more development, and more processes. It's not like processes are free either. It will certainly require a hell of a lot more testing and testing is expensive no matter how you do it.

wsuffa · Jul 24, 2013

livitup said:
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

It's not just mindset.

The "need" is defined by the mission - a forum or non-pay entertainment site might tolerate 99% reliability, a life-critical application (911 center) would want several 9s of reliability.

After determining the need, you analyze the environment and determine what you need to achieve the reliability goal. A site located where it can get grid power from two substations with a generator backup might need little more, where a mountaintop primary site might require an full backup site.

Compare cost to budget. Re-evaluate system needs & siting options. Rinse. Lather. Repeat.

SCCutler · Jul 24, 2013

docmirror said:
Did you just accuse me of impropriety, and discourtesy?

I should be offended. Nah - I'll have another juice box, and maybe go for a morning swim.

Not a chance, Doc; if I thought you were out of line, you'd hear about it in a PM, or on the phone!

My comment was a broader one, about AOPA, and organization whose mission I fully support, but whose methods (and occasional tone-deafness) sometimes drive me bananas.

murphey · Jul 24, 2013

livitup said:
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

There are some situations where 5 min/yr down time is unacceptable.

livitup · Jul 24, 2013

murphey said:
There are some situations where 5 min/yr down time is unacceptable.

I agree completely.

I think there are also some (many?) situations where more down time is actually acceptable than the customer thinks.

Every product at my company is critical and should have 99.999% availability... in the mind of the product's owner.

This is why we have an independent reliability assurance team which evaluates each product and classifies them as 0-3 on a time criticality scale. The ones that are a zero don't get as many resources as the ones that are a 3.

"Resources" is a word that covers a lot of stuff... 24-7-365 oncall support ($$$), service contracts with faster replacement time for the systems, more development oversight, testing oversight, change review, and yes, more systems or deployment in multiple data centers.

All I am trying to say is that Reliability is a holistic thing, whereas many people think you can just add more of ingredient 'x' (computer servers, power substations, engines on an airplane, cops on street corners, or whatever). In reality, it takes judicious application of ingredient 'x', along with real risk management, and a reliability mindset.

The hamster wheels at FDK didn't stop turning because of an equipment failure, their problems are a systemic risk management failure.

docmirror · Jul 24, 2013

livitup said:
All I am trying to say is that Reliability is a holistic thing, whereas many people think you can just add more of ingredient 'x' (computer servers, power substations, engines on an airplane, cops on street corners, or whatever). In reality, it takes judicious application of ingredient 'x', along with real risk management, and a reliability mindset.

The hamster wheels at FDK didn't stop turning because of an equipment failure, their problems are a systemic risk management failure.

I can groc this. I've seen datacenters where the suits come through and they talk all this RAS stuff as if they had any idea what the nuts and bolts actually do. Burning more watts is often not the right solution, and there is even an element of chaos involved in the deal as well. One of the big customers I work for was going to put their dark site in OK city cause they were gonna get some big tax breaks and incentives. The recent tornadoes put the kibosh on that plan.

SkyHog · Jul 24, 2013

livitup said:
Ok, I think my ranting brain clouded my thinking.

I agree that getting to 99.something takes good design, redundant hardware, etc. etc.

But would you agree that the difference between 99.99 and 99.999 (that's 52 minutes a year vs. 5 minutes a year) is mostly mindset?

And yeah, we have a few Stratus server systems in our data center, but in the end we found them to be no more reliable than a HA cluster of HP boxes and a good change control process.

In my case, anything after 10 minutes of downtime may (that's a big may) translate to 6 zeros worth of lost money.

That's a big, big deal. We take 99.99% very seriously.

murphey · Jul 24, 2013

flyingcheesehead said:
Hey folks,

I'm trying to make a rare trip to the red board, and I've discovered that in at least three different browsers, I'm unable to get in - There's a redirect loop happening. Is anyone else experiencing this?

And the problem with the Red Board being down is.....?

flyingcheesehead · Jul 24, 2013

murphey said:
And the problem with the Red Board being down is.....?

That there aren't enough people getting in to read my post.

flhrci · Jul 25, 2013

flyingcheesehead said:
That there aren't enough people getting in to read my post.

What post? Never saw any post from you. Linky?

David

flyingcheesehead · Jul 25, 2013

flhrci said:
What post? Never saw any post from you. Linky?

Ya don't need to go there, I posted it here too.

(about the flying club survey)

docmirror · Jul 25, 2013

AOPA forums workaround kit mailer:

1. Locally procure or locate large empty tin can, juice can works best.
2. Locally procure or locate about 200 feet of string or twine.
3. Remove one end of can, and poke small hole in exact center of other end.
4. Fed string end through hole in can.
5. ......

Skylane81E · Jul 25, 2013

Hope it is still working, I am still drunk!

AggieMike88 · Jul 25, 2013

docmirror said:
AOPA forums workaround kit mailer:

1. Locally procure or locate large empty tin can, juice can works best.
2. Locally procure or locate about 200 feet of string or twine.
3. Remove one end of can, and poke small hole in exact center of other end.
4. Fed string end through hole in can.
5. ......

Ignore the requests from users to install plugin that allows use of TapaCan for mobile devices.

2 weeks later, declare that everyone must upgrade to TinCan 2.0...

JeffDG · Jul 25, 2013

SkyHog said:
In my case, anything after 10 minutes of downtime may (that's a big may) translate to 6 zeros worth of lost money.

That's a big, big deal. We take 99.99% very seriously.

One place I used to work, we quantified the downtime on our primary system at $100k per hour using hard numbers.

At that point, we could show that "Sir, yes, it'll cost us $200k for such-and-such, but it will reduce our downtime by 3 hours over the lifetime of the system. Giving us an internal rate of return of 9%. Please sign this capital authorization now."

Palmpilot · Jul 28, 2013

After a week of decent performance, it's down again.

denverpilot · Jul 28, 2013

Palmpilot said:
After a week of decent performance, it's down again.

Whatsoever shall we do! LOL

X3 Skier · Jul 28, 2013

Back up but who knows for how long. :rolleyes2:

Cheers

us AAirways · Jul 28, 2013

Ker-plunk

AggieMike88 · Jul 28, 2013

Been a dry week out on the AOPA drinking game.

AOPA forums down?

Pre-takeoff checklist

Touchdown! Greaser!

Touchdown! Greaser!

Administrator

Touchdown! Greaser!

Pre-takeoff checklist

Touchdown! Greaser!

Touchdown! Greaser!

Touchdown! Greaser!

Touchdown! Greaser!

Final Approach

Touchdown! Greaser!

Touchdown! Greaser!

Final Approach

Touchdown! Greaser!

Touchdown! Greaser!

Touchdown! Greaser!

Tied Down

En-Route

Line Up and Wait

Touchdown! Greaser!