Cloud Computing

Terry

Line Up and Wait
Joined
Apr 3, 2005
Messages
738
Location
LaCrosse
Display Name

Display name:
Terry
Hi fellow pilots,

Is "The Cloud" a physical server somewhere?

In my hard drive, information is stored on a physical chip, given an address, and stored, until retrieved.

So, how does the "Cloud" actually store information, if not on a hard drive somewhere?

How is it addressed and retrieved?

Thanks,
Terry
 
Every ten years ago computing careens back and forth between central site and distributed (towards the user) computing. Cloud is back again in the central site. You data and applications run on a site far away from your computer/tablet/whatever. The only thing that's unique about the cloud is that the compute complex itself is hopefully distributed to make it available to you from a broad variety of locations.

It's sort of analogous to having a old-style personal answering machine and then switching to voice mail. Your voice mail "answering machine" is in the cloud...your messages are stored there and you can access them from your home phone, or from some other phone far away when travelling ...
 
I think I just saw a closely related thread "What is a Cloud?".

Of course with Cloud Computing, we have public clouds, private clouds and hybrids. What they are really referring to (and most of it is marketing buzz) is virtualization, elasticity, dynamic scaling and somewhat ubiquitous connectivity. Basically your data and apps live in the "cloud" and they scale as your needs grow and change. It is a big promise that has had some issues along the road, but makes sense for a number of applications, such as email and CRM. For small to mid size companies it could definitely help minimize investment in infrastructure and future upgrades.
 
CLOUD is an acronym. It doesn't really exist at all. You store stuff there, and in automagically turns into tiny bits floating in the air until you need them again.


C.L.O.U.D: Cant Locate Or Use Data
 
Cloud data storage abstracts the actual physical location of the data from the user of the data.

Basically, you don't know, physically, where the data is. There may be 20 different copies out there on different servers in different data centres spread across the world, all synchronized on the back end.

But the "which server" question is not really relevant to the application user. All they want is their little bits of information available when they want it.
 
the_cloud.png
 
From Wiki:
Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans across multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company.
...
As mentioned before, it is yet another flavor of centralized/decentralized computing. You have local data, lan data, wan data, and now network data. It all looks local to your application.
Local data is stored on your device.
LAN is stored on another device in you lan.
WAN is typically a business environment with fileservers.
NETWORK data is externalized to a service. It could be either direct access data, backups, copies, etc. Think picasa, facebook, amazon, or any number of other places that save your data on their servers.
.
All of this makes up a "cloud". You can mix how you save the data. That is, you can save it local, or on another device in your home, or you can copy it to a service you pay for or "SHARE" like google or facebook. How or where the data is is irrelevant to your application. As the user, you're not supposed to care. The problem with "cloud" data is speed of access. Realtime applications do not necessarily work well with cloud data if it has to come from a network.
 
In my hard drive, information is stored on a physical chip, given an address, and stored, until retrieved. (...)
How is it addressed and retrieved?

In most clouds, using HTTP acces, parts of URL constitute the address. In the largest data cloud in existence, Amazon S3, URL may include the netloc to designate so-called "bucket" (like BUCKET.s3.amazon.com/PATH/KEY). Otherwise, the URL path is used (s3.amazon.com/BUCKET/PATH/KEY). There may be variations, such as RAX CloudFIles (probably 3rd largest after S3 and Azure) scheme of cf.com/v1/AUTH_tenant/PATH/KEY.

Inside the cloud, the address is typically split right away into a bucket, tenant, or equivalent, and the rest. This way useful segmentation may be maintained and the transparency is balanced with opaquity. For example, in S3 a bucket is always maintained within a certain datacenter and a user may guarantee bucket placement (in case of a terrorist attack or a natural disaster). A typical bucket gets churned through a database map function to produce necessay internal addresses (typically internal base URLs). Well, I think every cloud except Azure does that.

The PATH/KEY selector is fed into a DHT (a hash table) function that also takes care or things like redundancy or Erasure Coding for clouds that use that. In the end it produces trailing parts of internal URLs. They are smashed back together with the base URLs derived from buckets. There's a great variety of approaches of doing that. One simple way is to make something like:

bucket => tenant (token)
path => suffix/hash (several per. redundancy or EC)
for example, for cf.com/v1/path/key:
http://10.0.50.20/v1/object/790/e7f...1694790/146c07ef2479cedcd54c7c2af5cf3a80.data
(several of these on 10.0.50.20, 10.1.50.20, 10.3.50.20 etc. as per placement)

In some cases there boundary between (internal) bucket and the hashed paths is somewhat varying. In particular it's well known that S3 changes the prefix length depending how loaded a bucket is. It is called "sharding".

Once the internal URL hits the actual server, it is addressed in exactly the same way your Word document with a path is.

So the high level overview is mostly trivial. I am omitting a bunch, of course, for simplification.

The really interesting question, which makes or breaks a cloud, is how it deals with failures. In a large cloud one disk drive or node failure happens between every few seconds to every hour. So someone on your ops team usually starts his day by looking at the failure report and perhaps ordering reconfigurations, possibly swapping drives, shelves, or racks (in case fail-in-place is used, when rack falls below a certain threshold).

Once the failure is made official, manually or automatically, the cloud has to restore the redundancy. To that end, it starts an internal copying (using a version of internal URL above -- EC clouds such as Azure also perform the math necessary to reconstruct the data from fragments).

The trick is that the storage and retrieval have to continue with imperceptible disruption while all that is going on. Obviously the recovery competes with the normal access for the bandwidth. So it cannot consume too much, or users see a major degradation. But it cannot go too slow, or else consequent failures may impart availability or even durability. The balance is tricky to get right. Again, the way, say, S3 does it, is a closely guarded secret. Usually a certain scheduling or throttling system decides how many megabytes per second to shuffle for each replication unit. But sometimes it's an ops guy setting percentages, monitoring the cloud health, and adjusting as needed.

P.S. Come to think of it, it's amazing that cloud storage works at all.
 
Last edited:
Cloud data storage abstracts the actual physical location of the data from the user of the data.



Basically, you don't know, physically, where the data is. There may be 20 different copies out there on different servers in different data centres spread across the world, all synchronized on the back end.



But the "which server" question is not really relevant to the application user. All they want is their little bits of information available when they want it.


What it doesn't abstract is the pipe getting to and from that data. I had an exec ask me to install a cloud based phone system for his call center last week. I asked him, "You know you only have one data circuit to this building right now, right? If it drops, can the business survive the phones being out?"

Of course they're also headed for a single phone carrier solution... Over that same piece of fiber. We discussed how that might not be the best idea either.

And they wanted only Softphones on the PCs. "You realize you still have staff on Win XP machines, right? Is it okay if any of the staff have to reboot to drop a phone call they're on?"

They had some rather fanciful ideas of how to engineer a small call center. ;)
 
By the way, if you're playing with cloud stuff and haven't tried DigitalOcean, you're missing out. Screaming fast. All system images on SSD. And screaming cheap.
 
Back
Top