Why should I pay for this AWS design decision?

I was writing a utility in Python (using boto) to test/play with Amazon’s SQS service. As boto isn’t particularly well documented where SQS specifically is concerned, I also plan to post some examples (either here or on Linuxlaboratory.org, or both). When I had some trouble getting a message that was sent to a queue, I went to the Amazon documentation, and found this little gem in the Amazon Web Services FAQ

I am sure that my queue has messages, but a call to ReceiveMessage returned none. What could be the problem?

Due to the distributed nature of the queue, a weighted random set of machines is sampled on a ReceiveMessage call. That means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. Your application should be prepared to poll the queue until a message is received. Note that with the 2008-01-01 version of Amazon SQS, you’re charged for each request you make, so set your polling frequency with that in mind.

So… if you were planning to decouple application components using SQS using an ‘eventual consistency’ model, keep in mind that they’re using the same model, and that they’re charging you for the privilege of eventually getting the messages you’ve already paid to put there, but aren’t necessarily available at any given point in time. I personally think this is a little goofy, and wrong.

If I put a message in a queue, I should be charged for actually getting the message. I should *not* be charged for checking to see if Amazon’s internal workings have made my messages available to me yet.

  • http://felter.org/ Wes Felter

    Amazon would have us believe that you have to pay one way or the other; presumably they chose eventual consistency because it is cheaper to implement than the alternatives.

  • http://groovie.org/ Ben Bangert

    Also, the latest boto’s SQS appears to be broken entirely. This is what I found out with our companies project when I accidentally used easy_install and got the latest boto. Reverting down to boto 0.9d apparently remedies the situation. I’d suggest trying 0.9d and using the fetch_queue, cause with the latest boto, it always came back empty.

  • http://codedemigod.com Alaa Salman

    I believe that this design decision is shared by all distributed, fault-tolerant system. I was reading an article about their SimpleDB, i think, where they said that the system sync occurs after like 6 minutes or so. So you can’t count it for any real time data handling.

    Think about it, how can you make a distributed fault-tolerant system which acts synchronously?

    How the system updates(and thus how frequently), of course, is another matter. There’s a lot of potential for innovation here.

  • http://www.ironfroggy.com/ Calvin Spealman

    Fault tolerance is just something you should expect to need, and that is all this is. The system is designed for massive scale, so obviously there is a trade off for the lower end of use.

    Kind of ironic how Amazon and Google and friends built these massively scaling platforms, turned around to sell them to others, just to have an enormous number of teeny, tiny things running on them.