<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>░░ fzysqr</title>
	<atom:link href="http://fzysqr.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://fzysqr.com</link>
	<description></description>
	<lastBuildDate>Thu, 19 Apr 2012 04:23:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>A Smarter FileBasedCache for Django</title>
		<link>http://fzysqr.com/2012/04/07/a-smarter-filebasedcache-for-django/</link>
		<comments>http://fzysqr.com/2012/04/07/a-smarter-filebasedcache-for-django/#comments</comments>
		<pubDate>Sat, 07 Apr 2012 14:30:36 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=657</guid>
		<description><![CDATA[A quick follow up on my previous post about a few minor issues we recently experienced with the Django FileBasedCache. I mentioned that we commented out a line of code in the framework in order to get the application running correctly in production. Of course that wasn&#8217;t going to be the permanent fix. The default [...]]]></description>
			<content:encoded><![CDATA[<p>A quick follow up on my previous post about a few <a href="http://fzysqr.com/2012/03/27/django-cache-and-burn/">minor issues</a> we recently experienced with the Django FileBasedCache. I mentioned that we commented out a line of code in the framework in order to get the application running correctly in production. Of course that wasn&#8217;t going to be the permanent fix.</p>

<p><span id="more-657"></span></p>

<p>The default implementation of FileBasedCache calls _cull() on every cache key set. _cull() has a nasty O(n) operation that quickly becomes unwieldy if you let the number of your objects in your cache get large. I think the original developer probably figured that the combination of a low default number for max_entries and a cull check that would run frequently would be the best solution for novice developers.</p>

<p>Ok, so maybe our use case is a little weird&#8230; but at least one other person has run into this <a href="http://blog.umlungu.co.uk/blog/2009/jun/25/speed-up-file-caching-in-django/">before</a>. She/He even filed a <a href="https://code.djangoproject.com/ticket/11260">bug!!!</a> Which was conveniently closed as <em>won&#8217;t fix</em>. Great. The docs should at least warn you about this issue&#8230;</p>

<p>I knew we would forget we had patched the framework, so clearly leaving our hack in place was not a good idea. The next update to Django (1.4!) would be sure to clobber it. So we subclassed FileBasedCache and made an overridden version of _cull() that checks for a -1 value for max_entries which indicates that we should skip the cull. We also added a _force_cull() method that takes a max_entries value and will trigger a real cull operation.</p>

<p><em>This goes in your source under app/cache.py:</em></p>

<pre><code>from django.core.cache.backends.filebased import FileBasedCache

class SmartFileBasedCache(FileBasedCache):
    def _force_cull(self, max_entries):
        self._max_entries = int(max_entries)
        return self._cull()

    def _cull(self):
        if int(self._max_entries) == -1:
            return
        return super(SmartFileBasedCache, self)._cull()
</code></pre>

<p><em>Then insert something along the following lines in your settings file. The most important part is the MAX_ENTRIES:</em></p>

<pre><code>CACHES = {
    'default' : {
        'BACKEND'  : 'fzysqr.app.cache.SmartFileBasedCache',
        'TIMEOUT'  : 32000000,
        'LOCATION' : '/mnt/cache/fzysqrapp',
        'KEY_PREFIX' : 'fzysqrapp',
        'OPTIONS'  : {
            'MAX_ENTRIES' : -1
        }
    }
}
</code></pre>

<p>Then, we created a manage.py command that we can call from cron on a schedule.</p>

<p><em>This goes in app/management/commands/force_cache_cull.py:</em></p>

<pre><code>from django.core.management.base import NoArgsCommand
from django.core.cache import cache as cache_backend
from optparse import make_option

class Command(NoArgsCommand):
    option_list = NoArgsCommand.option_list + (
        make_option('--max_entries', '-m', default=None, dest='max_entries',
            help='Sets the number of entries to reduce the cache size down to.'), )

    def handle_noargs(self, *args, **options):
        max_entries = options.get('max_entries', -1)
        return cache_backend._force_cull(int(max_entries))
</code></pre>

<p><em>The crontab looks something like:</em></p>

<pre><code>5 0 1 * * python /path/to/app/manage.py --max_entries 5000000 &gt; /dev/null 2&gt;&gt; /var/log/app_cache_cull.log
</code></pre>

<p>That&#8217;s it! We set up a few Cloudwatch metrics to track our cache size and wiped our hands of the whole issue. Hope this helps someone avoid the nasty suprise we had.</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2012/04/07/a-smarter-filebasedcache-for-django/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Update to VimRepressed</title>
		<link>http://fzysqr.com/2012/04/05/update-to-vimrepressed/</link>
		<comments>http://fzysqr.com/2012/04/05/update-to-vimrepressed/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 04:26:08 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[vim]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=681</guid>
		<description><![CDATA[So apparently OSX Lion and Vim plugins implemented in python are not playing well together. Calls to sys.stdout.write() and sys.stderr.write() appear to be crashing vim core. I say appear becauses it nearly impossible to debug. Rather than bang my head against the wall all night or do some brew compiling gymnastics, I just switched the [...]]]></description>
			<content:encoded><![CDATA[<p>So apparently OSX Lion and Vim plugins implemented in python are <a href="http://code.google.com/p/macvim/issues/detail?id=370&amp;q=segv">not playing well together.</a></p>

<p>Calls to sys.stdout.write() and sys.stderr.write() appear to be crashing vim core. I say <em>appear</em> becauses it nearly impossible to debug. Rather than bang my head against the wall all night or do some brew compiling gymnastics, I just switched the stdout/stderr calls to use an even more pythonic style:</p>

<pre><code>print &gt;&gt; sys.stderr, 'Blah'
</code></pre>

<p>Works great. Get your fresh code fixes <a href="https://github.com/jslatts/VimRepressed/commit/0020805f6dd5b3b9ad97864ae78f8f0bf5df07f0">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2012/04/05/update-to-vimrepressed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Django Cache and Burn</title>
		<link>http://fzysqr.com/2012/03/27/django-cache-and-burn/</link>
		<comments>http://fzysqr.com/2012/03/27/django-cache-and-burn/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 14:59:56 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=640</guid>
		<description><![CDATA[As part of our Amazon EC2 design, we decided to use S3 as our persistence layer; durability and easy storage growth for our data is very important to us. We knew that S3 would be to slow on its own for real time access. Luckily for us, the use profile of our application fits perfectly [...]]]></description>
			<content:encoded><![CDATA[<p>As part of our Amazon EC2 design, we decided to use S3 as our persistence layer; durability and easy storage growth for our data is very important to us. We knew that S3 would be to slow on its own for real time access. Luckily for us, the use profile of our application fits perfectly with a write-through cache strategy. Our users generally need a single dataset accessible for a week or two before they finish their analysis and move on, most likely never to access it again. Django&#8217;s file based cache seemed ideal. We could point it at the generous (and otherwise unused) instance storage on our EBS backed AMIs and cache several hundred gigabytes of data. If we lose a sever there would be performance penalty as the working dataset was built back up through a read-through strategy, but nothing unacceptable for a short period of time.</p>

<p><span id="more-640"></span></p>

<h2>Where there is smoke&#8230;</h2>

<p>We launched. The app was up and running and everything seemed fine. After two weeks of operation, we had accumulated a few performance complaints that we initially chalked up to placebo effect on our customers. After all, we had announced the upgrade publicly, so any perceived slow down on would be easily be associated with the change. We see this all the time.</p>

<p>We do take performance seriously and we want our customers to have a good experience, so we started a cursory investigation anyway. Our office Internet connection is anemic at the best of times and borderline unusable during the day, so I took a field trip to Starbucks where I sat down with a coffee and hit the app as a &#8220;real person&#8221; might. It was an abysmal experience. So bad in fact that I started to panic. Were we really forcing our users to endure this crap? I raced back to the office to get ready for some serious damage control. When I arrived, I found (to my surprise) no new complaints. In fact, the app was performing fine from the office. Turns out that the local Starbucks has a reputation for terrible wifi.</p>

<p>No experiment can be concluded with one data point, so a few of us tried it again that night from our homes. The app absolutely screamed! Armed with the knowledge that the app was fast for us, under low load, I began to suspect that our cache implementation was buggy and we were actually missing more frequently than we realized. I sat down with one of my devs and started getting ready to add some instrumentation into our caching code to make sure we were actually utilizing our massive file cache. Speaking of which, how massive is it?</p>

<pre><code>$ du -s -h -H . 
33M     /mnt/file_cache/
</code></pre>

<p>33 MB!? After two weeks of operation!? We were clearly not storing as much in the cache as we intended. After a quick round of debugging, we made a fun discovery in the Django docs: FiledBasedCache only stores 300 objects by default, regardless of what expiration time is set to. We did load test our EC2 environments, but not in a way that exercised cache turn over. We sheepishly put the additional config values (5,000,000 entries, stored for up to 1 year) and the cache size immediately shot up to a couple gigabytes. We were aiming to have around 200GB of working cache at any given time. We went home that day confident that we had made the app even faster and better for our customers.</p>

<h2>Premature confidence is the root of all evil</h2>

<p>Around 8AM the next day, one of the web servers threw a high CPU alert and promptly dropped out of the load balancer pool a few minutes later. Uh oh.</p>

<p>Before I could even bring up top on the afflicted box, it cleared its alarms and jumped back into the pool. A few minutes after that, the other web server threw a high CPU alert and dropped. Clearly something was amiss. The metrics showed that we were using way more CPU than typical across our web servers and that the spike in CPU usage was directly correlated to serving requests. Top showed that the culprit was apache and that our servers were simply too loaded to respond to new requests.</p>

<p><em>The last day was post-cache fix. Something isn&#8217;t right&#8230;</em>
<img title="Pre-patch CPU Usage" src="http://fzysqr.com/wp-content/uploads/2012/03/Pre-patch.png" class="aligncenter" /></p>

<p>The immediate suspicion was that our file caching fix was the culprit. It was, after all, the only change we had made to the environment in over a week. But it was hard to believe that cutting out constant S3 requests would actually be slower than servicing the requests from our cache. It was at this point we started to question the caching implementation itself. Just exactly how was Django handling these requests? It only took a few seconds to pull up the code (filebased.py in the core/cache/backends dir) and see the answer to our problem glaring us in the face.</p>

<p><em>Interesting, _cull() is called in each call to set():</em></p>

<pre><code>def set(self, key, value, timeout=None, version=None):
    key = self.make_key(key, version=version)
    self.validate_key(key)

    fname = self._key_to_file(key)
    dirname = os.path.dirname(fname)

    if timeout is None:
        timeout = self.default_timeout

    self._cull()    # Interesting, what is going on here?

    try:
        if not os.path.exists(dirname):
            os.makedirs(dirname)

        f = open(fname, 'wb')
        try:
            now = time.time()
            pickle.dump(now + timeout, f, pickle.HIGHEST_PROTOCOL)
            pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)
        finally:
            f.close()
    except (IOError, OSError):
        pass
</code></pre>

<p><em>Wonder how cull works?</em></p>

<pre><code>def _cull(self):
    if int(self._num_entries) &lt; self._max_entries:    # This should be true for us...
        return

    try:
        filelist = sorted(os.listdir(self._dir))
    except (IOError, OSError):
        return

    if self._cull_frequency == 0:
        doomed = filelist
    else:
        doomed = [os.path.join(self._dir, k) for (i, k) in enumerate(filelist) if i % self._cull_frequency == 0]

    for topdir in doomed:
        try:
            for root, _, files in os.walk(topdir):
                for f in files:
                    self._delete(os.path.join(root, f))
        except (IOError, OSError):
            pass
</code></pre>

<p><em>Well, we had max_entries set very high. Surely we were returning before trying to sort the file list&#8230;</em></p>

<pre><code>def _get_num_entries(self):
    count = 0
    for _,_,files in os.walk(self._dir):      # Argh! Burn it with fire!
        count += len(files)
    return count
_num_entries = property(_get_num_entries)
</code></pre>

<p>Ouch. A nasty O(n) operation on every set() call by way of _def_get_num_entries(). Whoops. Seems like our use case might not have been what the maintainers had in mind when they wrote these classes. The servers were collapsing under their own weight at a measly 12,000 cached objects. We certainly weren&#8217;t going to be able to handle 5 million. We quickly commented out the offending line of code (making set an O(1) operation) and watched our load plummet.</p>

<p><em>Ahhh. Much better. Almost half the load before this whole ordeal started!</em>
<img title="Post-patch CPU Metrics" src="http://fzysqr.com/wp-content/uploads/2012/03/Post-patch.png" class="aligncenter" /></p>

<p>The more permanent fix went in last night. We simply subclassed filebased.py to skip the cache cull check altogether when max_entries is -1. We then created a cron job to call our manage.py force-cull command. It will run once a month at midnight.</p>

<h2>Lessons Learned</h2>

<p>In the end we had two positive outcomes. First, we netted a big performance increase for our customers. Second, we saw that our application infrastructure was resilient enough to deal with the non-responsive web servers <em>without</em> losing availability for our logged on users.</p>

<p>We learned a few lessons over the last couple days:</p>

<ul>
<li>Caching is hard.</li>
<li>Don&#8217;t assume that open source maintainers are omniscient. They probably didn&#8217;t have your exact use case in mind. Read the code for critical infrastructure before you rely on it.</li>
<li>Make sure your code is as pre-instrumented as possible. Having to try and get additional debug level logging in place was a huge waste of time and introduced additional risks to our app.</li>
<li>Don&#8217;t be too quick to discount feedback from your customers/users, no matter how confident you are!</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2012/03/27/django-cache-and-burn/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A guide to hosting your HIPAA app in Amazon Web Services</title>
		<link>http://fzysqr.com/2012/03/15/a-guide-to-hosting-your-hipaa-app-in-amazon-web-services/</link>
		<comments>http://fzysqr.com/2012/03/15/a-guide-to-hosting-your-hipaa-app-in-amazon-web-services/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 13:26:07 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=612</guid>
		<description><![CDATA[In November, my team and I began an epic journey. We left our long-time homes in the bountiful lands of physical hosting in search of the mysical realm of &#8220;the clouds&#8221;. Cost of living in physical land was simply out of control. Last weekend, we arrived. We left Rackspace for Amazon Web Services. Sorry rackers! [...]]]></description>
			<content:encoded><![CDATA[<p>In November, my team and I began an epic journey. We left our long-time homes in the bountiful lands of <em>physical hosting</em> in search of the mysical realm of &#8220;the clouds&#8221;. Cost of living in physical land was simply out of control.</p>

<p>Last weekend, we arrived.</p>

<p><span id="more-612"></span></p>

<p>We left Rackspace for Amazon Web Services. Sorry rackers! Your customer service was great, and your core cloud services seem nice, but Amazon&#8217;s pricing structure is way better and they just keep rolling out amazing features.</p>

<p>The good news is that we made it. We launched all of our apps into production on AWS last weekend. With thousands of little challenges (and a few huge ones) behind us, I wanted to get my thoughts into a post before it all blurs into the next big project.</p>

<p>One of the biggest challenges we faced was finding a solid architecture for hosting our app on Amazon. Our needs are fairly straightforward: high availability, high performance (page load), no need for scaling, easily managed by a small dev team with no ops folks. Oh, yeah, and be fully <a href="http://www.hhs.gov/ocr/privacy/hipaa/administrative/enforcementrule/hitechenforcementifr.html">HIPAA HITECH</a> compliant.</p>

<p>We had all sorts of questions: How do you handle load balancing? Multi-zone failover? Encryption? Backups? Monitoring? Replication? Recovery? Security? Having little cloud experience ourselves, we turned to the internet for help. Most of the information we found fell into two categories: Big Time Serious Guys Doing Big Time Serious Stuff (think Netflix) or Sounds Like Us, But No Freaking Details. Maybe our needs are unique (I doubt it) or maybe we are just a little ahead of the curve with some of the features we are using (like VPCs). In the end, we borrowed from everyone and rolled it all up into a pretty great solution.</p>

<p>This post is about what we did, what worked for us, what didn&#8217;t. Little lessons that kicked our ass along the way. If you have similar needs, you can think of this as a high-level blueprint for your entire environment. If there is enough interest, I will post code and detailed implementation details in subsequent articles.</p>

<p>To get us started, our requirements were:</p>

<ul>
<li>HIPAA HITECH compliance. For us, this means tons of crypto</li>
<li>Availability-Zone (AZ) redundancy without downtime for customer facing components </li>
<li>No data loss from single AZ failure</li>
<li>&lt; 1 hour of data loss from full region failure</li>
<li>Improved security over our physical hosting </li>
<li>Improved performance over our physical hosting</li>
<li>Save massive amounts of money  </li>
<li>Fully automate code deployments and mostly automate infrastructure build out</li>
</ul>

<h2>Plan for the worst hope for something less than the worst</h2>

<p>Everyone is going to have different uptime requirements and corresponding budgets. We thought through the failure scenarios, we came up with some likely ones that we needed to address:</p>

<ol>
<li>Application failure. Data corrupting bugs, crashes, poor scale, whatever. These are problems we need solve within the scope of the application itself.</li>
<li>Isolated hardware failure. This means a cluster dropped offline, killing one of our EBS volumes or running instances.</li>
<li>Entire availability zone failure. Something went bad with an upgrade and took down the entire zone. Or maybe another replication storm. Either way, we have lost access to everything in that zone.</li>
<li>Entire region failure. Someone nuked a data center. It&#8217;s a smoking crater in the ground. Nothing left.</li>
</ol>

<p>For our business&#8217;s requirements (and our budget), we decided that we should attempt to survive 1, 2, and 3 with no customer impact (zero downtime and zero data loss). In the case of an entire region failure, we accepted that our application would be offline for a period of time while we rebuild the environment in another region. In this scenario, we do not want to lose more than an hour of data, but can tolerate a lag time in availability of that data.</p>

<p>You have to deal with number one (which is essentially software design and quality) no matter how you choose to host. We used the standard approaches for this: good development practices such as code reviews, testing (manual and automated), continuous integration, automated deployments, and backups (we use EBS snapshots for this).</p>

<p>In the second scenario, we can lose potentially lose several different node types: a web server, a database server, the NAT/VPN box, or a load balancer (ELB). Our design ensures availability as long as we have one of <em>each</em> type available. If we lose a web server, the ELB will automatically drop it from the pool. If we lose a database server, our web servers will automatically failover to the backup database server. ELBs are designed to be redundant by default.</p>

<p>The third secenario is similar to the second. As long as we ensure that we have the full app stack on <em>each</em> availability zone, then we can lose an entire zone and let the load balancer redirect traffic to the surving one.</p>

<p>The fourth scenario requires a more dynamic strategy. We do not want to bear the financial and administrative burdens of spanning regions (or cloud providers), so we are willing to tolerate downtime in the (hopefully) unlikely scenario that Amazon loses an entire region. Our strategy to handle this type of failure is something like this:</p>

<ol>
<li>Assess the extent of the failure based on available information and estimate how long the app will be unavailable.</li>
<li>If the outage looks likely to last more than a few minutes, redirect DNS to a failure page hosted elsewhere with a short TTL.</li>
<li>If the outage looks likely to be for longer than a few hours, begin bringing up a new environment in a non-affected region using our cold failover VMs and/or utilizing EC2 snapshots. In the event of a complete Amazon failure (all regions down everywhere), we would start the build out on another provider. </li>
</ol>

<p>In a major outage, the time to recover will vary wildly with the availability of various AWS services. If EC2 is down, but S3 is available, our recovery times will be considerably quicker than if we are not able to spool up new VMs, load balancers, etc. You can partially mitigate this with cold VMs and load balancers but there are costs associated with doing so.</p>

<p>It is important to remember that this is what worked for us given our requirements. If you need to survive region failures, you are going to have to go far beyond what we have done here.</p>

<h2>Regions and Availability Zones</h2>

<p>Before we go any farther, we must discuss geography in Amazon&#8217;s cloud. Amazon&#8217;s various web services are available across multiple regions. Regions are data centers which are geographically dispersed from each other. Amazon considers traffic between regions to be equivalent to regular internet traffic and you are charged as such.</p>

<p>By contrast, &#8220;Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region.&#8221; In other words, they are somewhat physically isolated even though they reside in the same data center. I would imagine they have fully separate (and redundant) power systems, networks, etc. Yet, they are close enough together to be connected at very high-speeds.</p>

<blockquote>
  <p>Did you catch the <em>imagined</em> bit? Amazon provides very few concrete details around their implementation of their services. Perhaps the biggest requirement placed on <em>you</em> for moving onto Amazon&#8217;s cloud is that you have to extend quite a bit of trust to them. If you do not trust them, the burden is on you to engineer your way around the issue. Realizing savings in the cloud means trusting one or more 3rd parties. The less your trust them, the less you save.</p>
</blockquote>

<p>We need to care about availability zones for two reasons:</p>

<ol>
<li><p>AWS provides tooling and support for spanning availability zones, but not regions. It is possible to build an &#8220;off the shelf&#8221; VPC that spans two or more AZs through the web interface. If you want to span regions, you are on your own and it&#8217;s gonna cost you more too.</p></li>
<li><p><a href="http://aws.amazon.com/ec2-sla/">Amazon&#8217;s EC2 SLA</a> specifically says that you must have unavailable instances in more than one AZ for that region to be considered unavailable, thus qualifying you for compensation. Amazon does not consider a region &#8220;down&#8221; unless more than one AZ is toast. Starting to catch my drift? They&#8217;ve told you to run your app across availability zones and not to come crying to them if you are caught out when one goes down.</p></li>
</ol>

<p>Amazon is practically telling you that AZs can fail. You should assume they are going to fail. They will fail. They have failed. Go read about a <a href="http://aws.amazon.com/message/65648/">big example</a> of this type of failure right now. I&#8217;ll wait.</p>

<p>Scared yet? You should be. A healthy fear of the cloud is important for success. You are about to add a massive layer of abstraction and only the prepared will stay up in the next great AWS outage.</p>

<h2>Virtual Private Clouds</h2>

<p>The foundation of our design starts with the a rather new AWS feature: <a href="http://aws.amazon.com/ec2/#features">virtual private clouds</a> (VPC). A VPC gives you the ability to have your own subnets that use reserved private network IP blocks (192.168.0.0/16, 10.0.0.0/8). These IP ranges are <em>not internet routable</em>, meaning that a remote client cannot even reach hosts on your VPC subnets&#8211;a huge security benefit. In addition to private IPs, VPCs offer quite a few other security improvements, such as bi-directional security group rules, network ACLs, and VPN access points.</p>

<p>VPCs will complicate your environment. Each host will no longer have direct internet access and will have to use network address translation (NAT) for any outbound traffic that originates from inside the environment (like software updates). You will also have to handle routing and network level access controls. There are also pricing impacts. Hosts on a VPC will not have direct access to S3 via sideband interfaces and will incur bandwidth costs when accessing S3 that regular EC2 hosts will not. With one of our guiding goals being to maximize security of our patient health information, using a VPC was an easy choice.</p>

<p>For our design, we opted for a class B network with four subnets spanning two AZs: two DMZs and two private. This allowed us to have both a DMZ and a private subnet on each of our AZs. The <a href="http://en.wikipedia.org/wiki/DMZ_(computing)">DMZ</a> is where any public facing hosts live. In our case, that means our elastic load balancers and our NAT/VPN host. The private subnets are where our web servers and database servers are located. The logical separation is a benefit, but the main reason we need the DMZ subnets is routing. Elastic load balancers will inherit their routing rules from the VPC subnets they face. In order to make NAT work correctly, the private subnet has a default route to the NAT host (outbound traffic musted be NAT&#8217;ed). For the ELBs and the NAT/VPN host to return responses to incoming traffic, they must route packets to the internet gateway. Thus, the DMZs use the internet gateway (IGW) for their default route and the private subnets use the NAT host.</p>

<p>I have mentioned the NAT/VPN host a few times here. I will explain that now. As discussed above, outbound traffic from inside this environment, such as software updates, git pulls, mail, or DNS resolution, must traverse the network through a NAT device. Amazon makes this easy by providing an AMI pre-configured to handle this duty. This does mean that you will have to eat the cost of another instance running 24/7. Luckily, it can be a small, maybe even a micro if they ever release those for VPCs. Another consequence of the VPC is that you will have to VPN to the network in order to directly access the hosts for administrative activities (unless you set up some port forwarding&#8211;not recommended for security). We decided to use OpenVPN and run it off the NAT device in order to avoid spinning up another 24/7 instance.</p>

<p><strong>This is our general design.</strong>
<img title="Overall Design" src="http://fzysqr.com/wp-content/uploads/2012/03/Overview.png" width="600" class="size-medium aligncenter" /></p>

<p>The NAT/VPN host is in its own security group that allows inbound traffic from our corporate IP range for VPN access. It allows outbound traffic to any IP for NAT duty.</p>

<p>One last point on VPCs. Amazon states that your VPC traffic is isolated from everyone else. You can probably take this at face value; however, they do not provide any specifics about how this is accomplished. In order for us to meet our HIPAA HITECH goal, we needed to be able to guarantee that our PHI will not be exposed. This meant we needed to know how VPC traffic isolation is achieved if we were to rely on it. We tried reaching out to AWS support and they tried to be helpful, but ultimately they cannot provide specific details, let alone a documented implementation.</p>

<p>This led us to a core design decision that impacted every aspect of our implementation: <strong>all traffic and all data at rest that contains PHI must be encrypted, even traffic within the VPC.</strong></p>

<h2>Elastic Load Balancers (ELB)</h2>

<p>There are only two ways for traffic originated <em>outside</em> our environment to enter: through an elastic load balancer, or through the NAT/VPN host. Each of our hosted applications has a load balancer assigned to it with a minimum of two web hosts (in two different AZs) assigned to its pool. The load balancers share a security group that allows only inbound (from anywhere) and outbound (to anywhere) TCP traffic on 80 and 443. In addition to balancing the traffic across hosts in the pool, the ELB will take a host out of the pool automatically if it fails. This is handled using a health check (http GET) to a route in our apps.</p>

<p>We also use the ELBs to terminate client SSL connections, so they are configured with our signed certificates and corresponding signing chains. The ELBs then initiate new SSL connections to the web servers in their pools to forward traffic. These &#8220;inside&#8221; SSL connections do not require signed certificates, as we control both sides of the connection.</p>

<h2>Route53</h2>

<p>ELBs can switch IP addresses and can dynamically change at anytime. AWS does this for load balancing and availability reasons. Instead of directing your traffic to a single IP address like a traditional load balancer, you instead are given a DNS name with a short TTL to send your traffic to. Typically, you simply add a CNAME record to your DNS provider so that your customers see your domain.</p>

<p>One interesting consequence of this strategy is that you cannot direct a zone apex, like fzysqr.com, to a CNAME. It is against the DNS spec. You have to use Amazon&#8217;s Route53 DNS service to get around the issue.</p>

<h2>Web and database nodes</h2>

<p>Our products are fairly standard web apps. At a minimum, they need to have at least one web server node and one database server node. In order to meet our fail-over requirements, we went with two web nodes per app and two database servers that are shared across all the apps. We decided to share the database nodes between our apps for cost purposes.*</p>

<p><em>*Protip: Build a spreadsheet model based on your current and expected usage. For us, it was extremely helpful to determine our cost/performance/complexity trade-off curves.</em></p>

<h3>Instance Types</h3>

<p>Amazon provides many different <a href="http://aws.amazon.com/ec2/instance-types/">instance types</a> with more becoming available each month. Each instance type can be &#8220;backed&#8221; by either elastic block stores (EBS) or instance storage. Instance storage is ephemeral, meaning when the instance stops, the data is lost. EBS is distributed, fault tolerant, theoretically faster (debatable), and more importantly, it hangs around even if you stop the VM. Even if you choose to use EBS backed VMs (we did) you still have access to the instance storage if you want it. This can be useful, because you get several hundred GB for &#8220;free&#8221; with each VM.</p>

<h3>Web servers</h3>

<p>For web servers, we ended up with EBS backed c1.medium instances. These are considered &#8220;High-CPU&#8221;. We initially tried to get away with m1.smalls, but during load testing, we found that a small would fall over at around 15 concurrent users loading our slowest page repeatedly. Under load, the small instances were reporting 50% <a href="http://rackerhacker.com/2008/11/04/what-is-steal-time-in-my-sysstat-output/">steal time</a> in top. This meant that our app was getting blocked by other guests on the cluster. We also theorized that our VMs were on a more fully allocated cluster and were getting shorted on network and disk I/O as well. Bumping the size to c1.mediums allowed our app to scale up to 50 concurrent users without breaking a sweat and would easily handle our normal usage patterns.</p>

<p>The web servers are fairly straightforward Apache/mod_wsgi/Django setups. Each application has two, one for each availability zone. The load balancer distributes traffic using a round robin algorithm. There are three special features that we added for the AWS deployment.</p>

<ol>
<li><p>Each app is configured to use the instance storage for its Django cache. This allowed us to cache data stored in S3 on disk, reducing S3 usage costs and increasing performance. Having independent caches on each web node presented a new problem with cache consistency and invalidation across the load balanced pool. We ended up writing an API for the servers to remotely invalidate each other&#8217;s cache to solve this problem.</p></li>
<li><p>Each app needed to be pointed to a primary database server, yet able to fail over to the secondary. To handle this, each web server runs its own instance of <a href="http://haproxy.1wt.eu/">haproxy</a> that is configured via a chef recipe to monitor availability of our database servers. The Django app is set to send all database traffic to localhost which is then forwarded on to a database server. Because each app has its own haproxy instance, we avoid having a single point of failure without having to run a MySQL cluster. The haproxy setup was based on Alex Williams&#8217; <a href="http://www.alexwilliams.ca/blog/2009/08/10/using-haproxy-for-mysql-failover-and-redundancy/">guide</a> with our own tweaks to the mysqld status scripts.</p></li>
<li><p>The apps are configured to send all database traffic over SSL. Remember, we don&#8217;t trust the VPC.</p></li>
</ol>

<h3>Databases</h3>

<p>Our database servers are m1.larges. We run MySQL in a master-master replication similar to the technique described in this <a href="http://www.neocodesoftware.com/replication/">tutorial</a>. A nifty trick to avoid auto-incremented key collisions is to have each server either use even or odd integers only when allocating new keys.</p>

<p>We had initially hoped to distribute our writes across both database servers. We quickly found that this wreaked havoc on session state stored in the databases and would frequently cause replication conflicts. ELBs have a stickiness feature that <em>mostly</em> solved this problem for us. However, occasionally the ELB would seem to forget that it was supposed to stick, breaking replication and causing a huge mess. We tried a few different approaches at tackling this issue before we decided that a <em>proper</em> solution to distribute database writes would require sharding, which we did not have time for. Instead we opted to balance our load by allocating each application a primary and secondary database server. The secondary would only be used in the event of the primary becoming unavailable. This meant a somewhat cruder load distribution, but at our scale the simplicity was a good compromise. We kept the master-master replication in place so that the servers could hot fail-over without human or scripted intervention.</p>

<p>Because we are paranoid freaks, we knew we would have to encrypt our databases. We accomplished this by locating the database stores on an encrypted EBS volume. We used <a href="http://www.saout.de/misc/dm-crypt/">dm-crypt</a> to handle the encryption. We wrote some nifty scripts to mount the volumes and start up mysqld when the server is booted. Yes, you have to type in the password every time the server reboots. Yes, if you lose that password, you are toast. Don&#8217;t lose it.</p>

<p>We also set up the master-master replication to work over <a href="http://dev.mysql.com/doc/refman/5.1/en/replication-solutions-ssl.html">SSL</a>. Trust no one!</p>

<h2>Simple Storage Service (S3)</h2>

<p>By far our favorite AWS tool is S3. We have to hold on to our customer data indefinitely and it can grow to be quite large. S3 means <em>never having to worry about running out of space</em>. The service is awesome, and cheap to boot, though not quite fast enough to replace local storage.</p>

<p>To get around potential performance problems, we ported our app to use S3 as primary storage with the VM instance storage as a giant local cache. When a client requests a dataset not in the cache, the app fetches it from S3 and stores it locally on the VM for up to a year. Since we store the datasets in the cache <em>when they are uploaded</em>, the vast majority of the time, our users will never even experience a cache miss. This strategy has a nice side effect. S3 only charges data transfer feeds on reads, writes are free. Reducing our reads to almost zero means cheaper bills for us.</p>

<p>Our biggest concern with S3 was protecting the PHI. S3 offers <a href="http://docs.amazonwebservices.com/AmazonS3/latest/dev/UsingServerSideEncryption.html">sever side encryption</a> that is disturbingly easy to use. This feature is meant to earn them &#8220;check box&#8221; compliance with various regulations, including HIPAA. But, unfortunately, we trust no one, remember? How do we know our keys are secure? How do we know that a comprised user account isn&#8217;t getting used to bypass the encryption entirely?</p>

<p>We had to roll our own application level encryption with keys under our own control. Our code simply encrypts (and compresses) our data before it is POSTed into S3 using pycrypto. We also encrypt locally cached objects as well.</p>

<p>Since it was easy, we left the SSE encryption in place too. Double encryption. Overkill? Probably.</p>

<p>Some tips:</p>

<ul>
<li>If you are using Python, use <a href="https://github.com/boto/boto">boto</a> to access S3 (or any AWS service).</li>
<li>Set up restricted users in IAM that only have access to one bucket (e.g. dev vs prod) and limit them to as few privileges as you can. </li>
</ul>

<h2>Automated EBS snapshots</h2>

<p>Part of satisfying the &#8220;lose less than one hour of data&#8221; requirement meant that we had to handle catastrophic failure of <em>both</em> database servers in a replication set. Our initial plan was to replicate the data to a slave outside of our region, but this proved to be an annoying problem with a VPC. We would need to set up a software region to region VPN or relax our security. Neither option appealed to us. Instead, we utilized a little Python/boto script that creates and rotates EBS snapshots once an hour. The EBS snapshots are stored in S3 and thus are as safe as any of our non-database critical data: <a href="http://aws.amazon.com/s3/faqs/#How_is_Amazon_S3_designed_to_achieve_99.999999999%_durability">99.999999999% durability</a>. If we lose a region (and all database servers within) we can at least recover back to the last snapshot. Of course, if we just lose one database server, we are only going to be missing any non-replicated writes.</p>

<h2>CloudWatch monitoring</h2>

<p>To track all this fancy infrastructure, you need a monitoring system. Lucky for you, Amazon is there once again with a solution: CloudWatch. For a small monthly fee, you can get detailed metrics on disk usage, network IO, and CPU usage across all your VMs, your ELBs, and most of the other AWS products. For another small fee, you can report your own custom metrics on absolutely anything. All this metric reporting comes with a powerful system to set up alarms when the numbers fall below critical thresholds. And graphs. Awesome graphs.</p>

<p><strong>Whoops. Someone left the iron plugged in.</strong>
<img title="Whoops" src="http://fzysqr.com/wp-content/uploads/2012/03/cpu.png" class="aligncenter" /></p>

<p>We use custom CloudWatch metrics for a few different alarms:</p>

<ul>
<li>Free disk space on our EBS volumes</li>
<li>MySQL replication status</li>
<li>MySQL process status</li>
</ul>

<p>Again, boto makes it easy to write little Python scripts to build up and report metrics. We run them from cron jobs on each server.</p>

<h2>Chef Server, chef recipes, infrastructure automation</h2>

<p>Throughout this post, I have discussed a ton of different configuration and implementation work we have had to do. Some of it probably sounds much worse than it was. This is because we make great use of Chef and Chef Server, an <a href="http://wiki.opscode.com/display/chef/Chef+Server">infrastructure automation tool</a>. Chef is a set of different server components that are all tied together around a ruby <a href="http://en.wikipedia.org/wiki/Domain-specific_language">domain specific language</a>. Basically, you compile &#8220;recipes&#8221; into &#8220;cookbooks&#8221; and then deploy them out to your server environments.</p>

<p>The upshot of using Chef is that we can write something like our disk space monitoring metric script above, add it to our Chef repository and within the hour it will be installed automatically on every server in our environment. There are lots of great freely available Chef recipes to build on and we have a ton of our own custom recipes:</p>

<p>We use Chef to:</p>

<ul>
<li>replicate ssh keys out to each environment</li>
<li>install all our of software dependencies</li>
<li>set up our metrics scripts</li>
<li>install and update cron jobs</li>
<li>run EBS snapshots</li>
<li>set up user groups and file system permissions</li>
</ul>

<p>&#8230; and much more. We chose to host our own installation of Chef Server (because we&#8217;re cheap) but you can have Opscode <a href="http://www.opscode.com/hosted-chef/">do it for you</a>, for a monthly fee.</p>

<h2>Automated deploys using TeamCity</h2>

<p>Our last bit of trickery is our fully automated code deployments. We use TeamCity and copious bash scripting to make this work. Because of the VPC, it has to be a bit more complex than we prefer. To deploy TeamCity:</p>

<ul>
<li>checks out source from Kiln</li>
<li>runs unit tests</li>
<li>packages code into tarball</li>
<li>starts a VPN connection to the target VPC</li>
<li>scp transfers the tarball to the target servers in the VPC</li>
<li>uncompresses the tarball and runs the deploy script</li>
<li>runs database migrations</li>
<li>restarts apache using graceful</li>
<li>disconnects the VPN</li>
</ul>

<p>Having a reliable and well-tested automated deployment process reduces the risk of deployment related downtime. We often deploy mid-day and our users don&#8217;t even know it happened. *</p>

<p><em>*Protip: this requires solid upstream testing automation to work.</em></p>

<h2>Tired of typing</h2>

<p>My original idea for this blog post was a bit grand. There are simply too many things I would love to share. All the little gotchas and technical details. So let&#8217;s consider this the overview. This can be the framework for me to hang little tid-bits of implementation and chunks of code over the next several months. If there is anything you would love to see first, please let me know at <a href="mailto:comments@fzysqr.com">comments@fzysqr.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2012/03/15/a-guide-to-hosting-your-hipaa-app-in-amazon-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Use sed for quick and dirty templates in your deploy scripts</title>
		<link>http://fzysqr.com/2012/03/03/use-sed-for-quick-and-dirty-templates-in-your-deploy-scripts/</link>
		<comments>http://fzysqr.com/2012/03/03/use-sed-for-quick-and-dirty-templates-in-your-deploy-scripts/#comments</comments>
		<pubDate>Sat, 03 Mar 2012 20:09:09 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=598</guid>
		<description><![CDATA[A few years ago, my team and I invested the effort to build a continuous integration environment. We use TeamCity and we are quite happy with it. Once you have a platform for automated builds, you quickly become interested in automated deploys. Embarking on the painful journey to get your software to &#8220;1-click&#8221; production deploys [...]]]></description>
			<content:encoded><![CDATA[<p>A few years ago, my team and I invested the effort to build a continuous integration environment. We use <a href="http://www.jetbrains.com/teamcity/">TeamCity</a> and we are quite happy with it. Once you have a platform for automated builds, you quickly become interested in automated deploys. Embarking on the painful journey to get your software to &#8220;1-click&#8221; production deploys is one of the healthiest investments you can make in your dev infrastructure.</p>

<p>These days we are a ways past simple deploys. Our CI infrastructure runs tests, deploys code, migrates databases, deploys infrastructure, and even moves datasets around our environments. All this good automation is glued together with various build scripts&#8211;some in python, some in bash.</p>

<p><span id="more-598"></span></p>

<p>If you are like us, you have multiple products running in multiple environments. Alpha, beta, uat deploys for each product, potentially hosted in multiple geographic regions in clustered deployments. All these different environments mean different configurations need to be selected at deploy time. We use a variety of methods to tackle this. On our .NET apps, we use msbuild configurations to select the correct *.config transformations. In our django apps, we use a homegrown system that will select settings.py files based on the host the code is running on.</p>

<p>We also need to be able to deploy different Apache configurations depending on the target environment. One of our requirements (maybe goal is a better word) is that we should be able to deploy code to a new environment without any pre-configuration, assuming the correct infrastructure is in place. If I need to throw out a new instance of our app to demo a feature change to a customer, it should be a matter of copying the build config in TeamCity, changing the name of the instance, and firing it off.</p>

<p>All this is just background information for me to share a cute little tip with you. When we need to deploy a different Apache config for a different environment, we use sed to make a quick and dirty mustache style templates. <a href="http://en.wikipedia.org/wiki/Sed">Sed</a> (stream editor) is an old standby in the bearded unix hacker&#8217;s toolkit. It will parse a text stream, executing regex string replacements, and stream out the result. By defining a standard token format, in this case {{mustache style}} tokens, we can use sed to transform our Apache configs as a lightweight templating system.</p>

<p><strong>Example template (named default_template) is bundled with source:</strong></p>

<pre><code>&lt;VirtualHost *:80&gt;
  ServerName {{HOSTNAME}}:443

  Alias /static {{DEPLOY_PATH}}/static
  &lt;Directory {{DEPLOY_PATH}}/static&gt;
    Order allow,deny
    Allow from all
    Options -Indexes
  &lt;/Directory&gt;

  WSGIScriptAlias / {{DEPLOY_PATH}}/django.wsgi
&lt;/VirtualHost&gt;
</code></pre>

<p><strong>Execute transformation during deploy using sed and copy into place:</strong></p>

<pre><code>#!/bin/bash

CONFIG_NAME=$1
HOSTNAME=$2
DEPLOY_PATH_SED="\/var\/www\/$CONFIG_NAME"
DEPLOY_PATH=/var/www/$CONFIG_NAME
APACHE_PATH='/etc/apache2'

sed -e "s/{{HOSTNAME}}/$HOSTNAME/g; s/{{DEPLOY_PATH}}/$DEPLOY_PATH_SED/g" default_template &gt; $CONFIG_NAME

cp $CONFIG_NAME $APACHE_PATH/sites-available/

ln -s $APACHE_PATH/sites-available/$CONFIG_NAME $APACHE_PATH/sites-enabled/$CONFIG_NAME

apache2ctl graceful
</code></pre>

<p>These are just snippets of our full deploy script and Apache template, but you get the idea. A few small points about this technique:</p>

<ol>
<li>We use named based virtual hosts to run multiple dev environments on one server.</li>
<li>You can chain together multiple replacements using semi-colons.</li>
<li>You must be careful with escaping your backslashes. In my scripts, I have to define separate variables for the sed replacement and the file copy.</li>
</ol>

<p>TeamCity simply runs <em>sh deploy.sh alpha alpha.fzysqr.com</em> and everything drops into place.</p>

<p>Yay devops!</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2012/03/03/use-sed-for-quick-and-dirty-templates-in-your-deploy-scripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>node grinding the crack (between Windows and Linux)</title>
		<link>http://fzysqr.com/2011/11/13/node-grinding-the-crack/</link>
		<comments>http://fzysqr.com/2011/11/13/node-grinding-the-crack/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 05:51:20 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=566</guid>
		<description><![CDATA[Let me apologize up front for the terrible title. I have been recently inspired. I met some fantastic people at the fantastic Keeping It Realtime Conference last week. The speakers were great. The conversations were fantastic. The parties were fun. I will go again next year. There was so much information to consume that I [...]]]></description>
			<content:encoded><![CDATA[<p>Let me apologize up front for the terrible title. I have been recently <em><a href="http://www.youtube.com/watch?v=TWfph3iNC-k">inspired</a>.</em></p>

<p>I met some fantastic people at the <strong>fantastic</strong> <a href="http://krtconf.com/">Keeping It Realtime Conference</a> last week. The speakers were great. The conversations were fantastic. The parties were fun. I will go again next year. There was so much information to consume that I am just now sorting some of it out. In my mind, the experience plays backs like ten thousand snapshots in rapid succession. A blur of events, conversations, people, code, and crappy wifi. A couple of faces stand out: the crew from Microsoft, Paul Batum, Glenn Block, and of course Scott Hanselman. They left a big impression on me. Not because I was in shock that Microsoft was actually represented at this event (though I was), or that these guys were deep behind enemy lines (you should have seen the mac/win ratio), or because they were all wickedly smart (they were). No, it was that Microsoft seems to be genuinely making a go of supporting node on Windows and it actually looks pretty good.</p>

<p>Okay. So what does this have to do with base jumping wingsuit insanity? Well, dear reader, not much. But I&#8217;ll try and tie it back. If you browse back through my archives or follow me on twitter, you will get a lot of Linux and node content. But every so often, something Windowsy slips through. You see, the truth is: I am .NET developer. Or at least I used to be. Now, I manage a team of developers&#8211;a mixed team. We are about 50/50 .NET and Django with a sprinkle of node on top. Our various products have to integrate with each other and I make platform decisions on a daily basis where I am pitting a Windows stack against a Linux one. In short, I &#8220;grind the crack&#8221; between them. But without the wingsuit and inevitable spectacular death.</p>

<p>Back to the conference. At the opening event, I met Glenn and Paul. I had a great discussion with them about where my team is and where we are going. I like node and I see some great use cases for it in our business. At one point, Glenn asked me, &#8220;What would it take to you get you choose windows?&#8221; I answered that it was probably too late for us, &#8220;We have node programs running on Linux. Why would we switch?&#8221; I told him that I wholeheartedly supported node on Windows for the good of the node community, but that we were already past the point of needing it ourselves.</p>

<p>Fast forward two days. I make sure to attend Glenn&#8217;s and Tomasz&#8217;s talk on node for Windows (the main point behind the recent 0.6.X release by the way). I didn&#8217;t expect to hear anything I wasn&#8217;t already aware of. And I was surprised. Floored even. They have actually done a great job creating a Windows &#8220;story&#8221; for node. They want you to run node with IIS. If you do that, you get some <a href="http://tomasz.janczuk.org/2011/08/hosting-nodejs-applications-in-iis-on.html">pretty awesome stuff</a> for free:</p>

<ul>
<li>built-in process management ala <a href="http://blog.nodejitsu.com/keep-a-nodejs-server-up-with-forever">forever</a></li>
<li>load balancing between node processes</li>
<li><em>graceful</em> auto-refresh of the node process when code changes</li>
<li>remote node-inspector (hell yes!)</li>
<li>logs over http </li>
</ul>

<p>Yes, I know you can get all these things on Linux. And, maybe there are arguments to be made about scaling (haven&#8217;t seen benchmarks, so I have no idea). But you know what? I <em>don&#8217;t have</em> scaling issues. We have a modest user base. Our biggest challenges are building a great experience for our users, integrating disparate systems, and maintaining our automated deployment and testing infrastructure. With iisnode, I can just include the node javascript along side our existing .NET app and ship it with the same TeamCity deployment. I won&#8217;t even need to restart IIS!</p>

<p>We were already planning on using node and WebSockets to bring some realtime features to our ASP.NET MVC app and Microsoft just made my life <em>simpler</em>. Glenn: you got me back (at least partly). I am excited to see what you guys do next.</p>

<p>Bravo.</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2011/11/13/node-grinding-the-crack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick vim tip #1 (and I am not dead)</title>
		<link>http://fzysqr.com/2011/09/24/vimtip1/</link>
		<comments>http://fzysqr.com/2011/09/24/vimtip1/#comments</comments>
		<pubDate>Sat, 24 Sep 2011 16:55:41 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://fzysqr.com/?p=559</guid>
		<description><![CDATA[Sorry for the summer hiatus. Work, vacation, and mountain biking put a serious squeeze on my free time hacking and blogging activities. I have been working on some cool stuff lately and the cold weather is starting to arrive here in Spokane, so I should have plenty of blog posts coming up. Anyway, quick vim [...]]]></description>
			<content:encoded><![CDATA[<p>Sorry for the summer hiatus. Work, vacation, and mountain biking put a serious squeeze on my free time hacking and blogging activities. I have been working on some cool stuff lately and the cold weather is starting to arrive here in Spokane, so I should have plenty of blog posts coming up.</p>

<p>Anyway, quick vim tip #1. If I have a chunk of a code with a string:</p>

<pre><code>var blah = "Some string that I want to replace" + previouslyDefinedVariable
</code></pre>

<p>I want to replace it with &#8220;new string&#8221;. In vim, do:</p>

<ol>
<li>Put your cursor on the first character inside the quotes of the string you want to end up with, in our case, the <em>n</em> from &#8220;new string&#8221;.</li>
<li>Type yi&#8221; &#8211; yank inbetween quotes</li>
<li>Navigate to the string you want to replace</li>
<li>Type di&#8221; &#8211; delete inbetween quotes</li>
<li>&#8220;0P &#8211; paste content from yank buffer</li>
<li>Marvel at your vim prowess.</li>
</ol>

<p>Got any great vim tips? I would love to here about them. Send them to <a href="mailto:vimtips@fzysqr.com">vimtips@fzysqr.com</a>. Check out my full vim config fork of scrooloose&#8217;s config on github (here)[https://github.com/jslatts/vimfiles].</p>

<p><strong>Updated</strong> Cory Schmitt wrote in with a helpful suggestion that saves two keystrokes!</p>

<ol>
<li>Put your cursor on the first character inside the quotes of the string you want to end up with, in our case, the <em>n</em> from &#8220;new string&#8221;.</li>
<li>Type yi&#8221; &#8211; yank inbetween quotes</li>
<li>Navigate to the string you want to replace</li>
<li>Type vi&#8221;p &#8211; visual mode select inbetween quotes and paste over</li>
<li>Congratulate yourself for being awesome.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2011/09/24/vimtip1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Version 0.0.9 of Stalker for node.js is available</title>
		<link>http://fzysqr.com/2011/06/20/version-0-0-9-of-stalker-for-node-js-is-available/</link>
		<comments>http://fzysqr.com/2011/06/20/version-0-0-9-of-stalker-for-node-js-is-available/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 05:38:14 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://fzysqr.com/2011/06/20/version-0-0-9-of-stalker-for-node-js-is-available/</guid>
		<description><![CDATA[I have just released 0.0.9 of Stalker. Improvements (some from 0.0.7 as well) are: Added batch mode to group callbacks Added second callback for file removal notices (this works in batch mode as well) Switched unit tests to vows.js and added additional coverage Made errors callback immediately in standard node-fashion. Fixed bugs Get it on [...]]]></description>
			<content:encoded><![CDATA[<p>I have just released 0.0.9 of Stalker. Improvements (some from 0.0.7 as well) are:</p>

<ul>
<li>Added batch mode to group callbacks</li>
<li>Added second callback for file removal notices (this works in batch mode as well)</li>
<li>Switched unit tests to vows.js and added additional coverage</li>
<li>Made errors callback immediately in standard node-fashion.</li>
<li>Fixed bugs</li>
</ul>

<p>Get it on npm now or head to <a href="https://github.com/jslatts/stalker">github</a>.</p>

<p>Thank you for all the feedback.</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2011/06/20/version-0-0-9-of-stalker-for-node-js-is-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing stalker, a node.js module. Now your files can get restraining orders against you too.</title>
		<link>http://fzysqr.com/2011/06/08/introducing-stalker-a-node-js-module-now-your-files-can-get-restraining-orders-against-you-too/</link>
		<comments>http://fzysqr.com/2011/06/08/introducing-stalker-a-node-js-module-now-your-files-can-get-restraining-orders-against-you-too/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 05:14:00 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://fzysqr.com/2011/06/08/new-node-js-module/</guid>
		<description><![CDATA[I released my first node.js module today on npm, aptly named stalker. stalker is basically a wrapper around fs.watchFile(). It will watch a directory tree for new files and fire off your callback whenever it finds one. It tries to be smart about entire nested folder/file structures and tricksy add/removal/add type of stuff. Get it [...]]]></description>
			<content:encoded><![CDATA[<p>I released my first node.js module today on npm, aptly named stalker. stalker is basically a wrapper around fs.watchFile(). It will watch a directory tree for new files and fire off your callback whenever it finds one. It tries to be smart about entire nested folder/file structures and tricksy add/removal/add type of stuff.</p>

<p>Get it on npm: <em>npm install stalker</em></p>

<p>You can test it by running <em>node example/test.js</em> and then dropping files and folders in the example directory.</p>

<p>To incorporate it in your program, simply call stalker.watch with your callback:</p>

<pre><code>stalker.watch('/var/foo', function(err, file) {
  console.log('stalker saw file %s', file);
});
</code></pre>

<p>Code is at github: <a href="https://github.com/jslatts/stalker">https://github.com/jslatts/stalker</a></p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2011/06/08/introducing-stalker-a-node-js-module-now-your-files-can-get-restraining-orders-against-you-too/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>node-inspector and the missing &#8211;start-brk option</title>
		<link>http://fzysqr.com/2011/05/29/node-inspector-and-the-missing-start-brk-option/</link>
		<comments>http://fzysqr.com/2011/05/29/node-inspector-and-the-missing-start-brk-option/#comments</comments>
		<pubDate>Sun, 29 May 2011 19:47:26 +0000</pubDate>
		<dc:creator>jslatts</dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[nodejs]]></category>

		<guid isPermaLink="false">http://fzysqr.com/2011/05/29/node-inspector-and-the-missing-start-brk-option/</guid>
		<description><![CDATA[Quick tip to anyone having trouble finding the right way to start a node-inspector debugging session with an initial break point. Lot&#8217;s of youtubes and how-to&#8217;s mention a &#8211;start-brk=file.js option. If you try and actually use it, it doesn&#8217;t work. The correct way to do this now is: node-inspector &#38; node --debug-brk --debug server.js Head [...]]]></description>
			<content:encoded><![CDATA[<p>Quick tip to anyone having trouble finding the right way to start a node-inspector debugging session with an initial break point. Lot&#8217;s of youtubes and how-to&#8217;s mention a &#8211;start-brk=file.js option. If you try and actually use it, it doesn&#8217;t work. The correct way to do this now is:</p>

<pre><code>node-inspector &amp;
node --debug-brk --debug server.js 
</code></pre>

<p>Head to http://0.0.0.0:8080/debug?port=5858 to find your app nicely stopped on the first line.</p>

<p>Happy noding!</p>
]]></content:encoded>
			<wfw:commentRss>http://fzysqr.com/2011/05/29/node-inspector-and-the-missing-start-brk-option/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

