Monday, 30 April 2012

5 Things all Java developer should know when developing for the cloud

The last couple of years, "Cloud Computing" replaced Web 2.0 as the new buzzword. You can read, hear and see everywhere the cloud is coming. To most developer, this is still the same old sh*t. If you have experience in developing distributed system then you should be fine, you say. Well not entirely true, the IT department wants to deploy on cheap cloud and therefore some restrictions now applies. I will list 5 things that I think all developers should know when working with cloud Platform as a Service provider such as Amazon Beanstalk or Google App Engine. This list also applies to IaaS architecture. Some of the points might be obvious to the more experienced, nevertheless, they need to be mentioned.

  • Static objects
We all know the difference between instance variable (non-static) and class variable (static variable). We use static to tell the JVM that they should only be one instance of this variable (singleton). If the static variable is declared with the "final" keyword, this will not cause a problem in a distributed environment as the value will never change. The problem is when we expect the value of the variable to change. As in a cluster environment, GAE and Beanstalk run your application in multiple JVM. If a the value of your static variable has changed in JVM, it will not be propagated to the cluster therefore leading to inconsistencies. I recommend that you avoid static variable unless that set as "final" and their values are hard-coded so there is no way to change their values are runtime.

  • Caching Objects
This one is related to performance in order to avoid expensive operations such as running database queries and others. Sometimes we need to cache objects in memory and therefore we implement our own caching strategy through the use of simple HashMap or some other caching solutions available outthere. Caching has many benefits but implementing a caching strategy should be approached with care. This is because caching has the same problem as static objects. Your cache will be in the local JVM therefore not it will not be visible in the cluster. There are some solutions, for example, GAE uses Memcached and Beanstalk can make use of Amazon ElastiCache which is compliant with Memcached. When developing for a PaaS environment, make sure to not implement your own caching system but look for one that is supported by the vendor. I know this can lead to vendor lock-ins.

  • Server-side Session
Something we do take for granted in single environment is storing application session data on the server. Based on experiences, mainly using GAE, I encountered multiple issues with session management. Since then, Google has fixed alot of the issues with the way GAE handle sessions for Java application. To minimize writing session to a datastore, we store application state in memory. Most application are written without any vendor approach in mind; so we use JEE as-is. This approach would work in you deploy in any self hosted clustered environment but Google PaaS. Google implements their own session management which is off by default therefore you need to enable it in appengine-web.xml and make sure that all your objects implements the java.io.Serializable interface. 
Note: Note, session data is always written synchronously to memcache. If a request tries to read the session data when memcache is not available (or the session data has been flushed), it will fail over to the datastore, which may not yet have the most recent session data. This means that asynchronous session persistence may cause your application to see stale session data. However, for most applications the latency benefit far outweighs the risk.

  • Event-driven Execution
This is more about running a process at a given time such as Scheduling task. Again, in a managed environment, it is straightforward to implement a timer or scheduler service. But this is a clustered environment which is not managed by yourself and their stack his different to yours. I personally use Quartz Scheduler when working in a single server environment. In a clustered environment such as Beanstalk or GAE, it is difficult to know which instance will be triggered and execute the task only once. The folks at Google have provided another solution with their own implementation of Cron for Java which can be used. At the time of writing, Amazon Beanstalk didn't have a solution yet. Therefore, consider before-hand when designing your system, which approach to take in order to create scheduled tasks for your application.

  • JRE white list
I believe this related to GAE J only. Google App Engine for Java doesn't allow the use for all available API in Java, especially if they do require access to the file system. The fact that there is a such a restriction impose by the Google has led us to look elsewhere for some of our projects. The cost of re-developing our application to please them is much higher than deploying them elsewhere. Also, another downside of GAE J is doesn't fully support JEE servlet specification. You cannot implement custom security for your application through your web.xml therefore pushing you to use Google own security mechanism. I would recommedn using GAE J when developing a greenfield project which can be built with these restrictions here  and here in mind. If you want to be locked-in using GAE J for your application, then I recommend it as a cost efficient way to testing your application otherwise, look somewhere else.

I hope this was helpful and if there's mistake, feel free to get back to me and I make any corrections. Also, I am sure that I am missing some other points, add them to the comments sections.

P.S. here is a nice comparison from IBM

Cheers and Happy Coding.



16 comments:

  1. Hi Armel, Have you tried Heroku? I'd love to hear your feedback.

    -James

    ReplyDelete
    Replies
    1. Hi James, I haven't used Heroku yet. Funny enough I was looking at yesterday to try to understand what it brings to the table. I am also considering Cloud Foundry as alternative to GAE and Beanstalk. I will let you know once I tried it.

      Delete
  2. For point #1 to #4, those are things you have to take care of when you develop applications which are going to be deployed in a cluster. They apply to cloud application development, but they are not new to cloud. If you do want to cache data, you need some sort of messaging system to synchronize data across multiple servers.

    ReplyDelete
    Replies
    1. Hi Jun, you're right. Those need to be taken care of in clustered environment. The point here is that you do not really have the freedom to use whichever framework you want as it might not be supported by the cloud vendor therefore leading to vendor locking. Some vendors do not support caching and only recently implementing a caching mechanism such as Amazon ElastiCache.

      Delete
  3. Thanks for the nice intro into the Cloud. I'll just comment on using JEE which is not official (though common): http://www.java.com/en/about/javanaming.jsp "Please say Java"

    ReplyDelete
  4. Programming in cloud..a good start for me..

    ReplyDelete
  5. Good Article ...

    ReplyDelete
  6. I have very minimal knowledge on Cloud. With this article I learnt new things on Cloud in Java Perceptive. Good one. Thanks.

    ReplyDelete
  7. I have seen fantastic blogs and I have seen not so fantastic blogs. This blog is very informative in many ways and certainloy ranks in the former category. Really appreciate the information your providing use avid readers!
    http://celabright.com/

    ReplyDelete
  8. Nice Post,
    Thanks
    vtiger CRM is free open source CRM with full-featured. vTiger crm is best suited for small and medium sized
    business.

    VTiger
    VTiger CRM
    vTiger Integration

    ReplyDelete
  9. I am new to the combination of cloud and java but as per my development experience cloud have bright future with java EE 5 and 6 as it consist EAR which makes cloud apps provisioning easy.

    ReplyDelete
  10. Thanks for sharing useful information. I always make sure to bookmark pages like this because you know it will be useful in the future too. thanks again.
    cloud backup

    ReplyDelete
  11. I'm the same way I do my best to remain neutral. It's hard if you communicate with the person the other person dislikes then you fall out of favor with them! I simple can't dislike a person just because someone else does I just can't.
    child custody investigation

    ReplyDelete
  12. I really believe you will do much better in the future I appreciate everything you have added to my knowledge base. Admiring the time and effort you put into your blog and detailed information you offer!
    contacts

    ReplyDelete
  13. You can use signals and robots and nadex binary options , but i can tell you that before you can use any strategy you have to understand yourself first, how is your mind work, without it you will continue the up and down profit/loss, feeling pressure/calm.Visite our nadex binary options Our web site is http://www.automatedbinaryoptions.co/?p=180

    ReplyDelete
  14. Thanks for sharing information for Java Developers . These are the best guidelines.

    ReplyDelete