Exiting Node.js Asynchronously

Written 5 years ago November 10th, 2015 by Cédric Ronvel
node.js · async

Sayonara!

Node, exit and asynchronous cleanup

Node can exit for three main reasons:

implicitly: when there is no more code to be executed AND the event loop runs dry (i.e. there is no outstanding timer and I/O listener).
explicitly: when the code execute a process.exit().
accidentally: an uncaught exception bubbles up to the event loop, and there is no uncaught exception listener on the process.

If you are explicitly closing the app, you can trigger whatever cleanup code you need, before calling process.exit().

However there are a lot of case where you can't predict at all when your program will exit: e.g. if there is simply no more job scheduled in the event loop, your program has done the job and will exit.

If some cleanup code has to be performed, we can listen for the exit event on the process object. However, the program will exit as soon as all listeners have returned: only synchronous tasks can be perfomed here!

But the very nature of Node.js is asynchronous: lot of stuff we would want to do cannot be done synchronously.

For example: we want to log the context and status when exiting, but one of our transport is async (e.g. it logs to a database or over the network). If we send data over the wire from that kind of listener, there are good chance it will never reach its destination.

Introducing async-kit

I will sound like I'm doing (again) my self-promotion here, but there is a great package (sic!) that do the job: async-kit. This is the first package I ever wrote. This is a toolbox that deals with whatever async stuff you have to do, and it has been tested for nearly two years now... and it still receives few improvements once in a while!

The last feature addition it received is the async.exit() method. This is a replacement for process.exit(), and its purpose is to exit asynchronously.

Maybe I should exit asynchronously, after the rain is gone...

async.exit( code , timeout )

code number the exit code
timeout number the maximum time allowed for each underlying listener before aborting, default to 1000 (ms).

When you call async.exit(), it emits the asyncExit event on the process object.

There are two kinds of listeners:

function( [code] , [timeout] ) listeners, that does not have a callback, are interested in the event but they don't need to perform critical tasks or can handle them synchronously. E.g.: a server that will not accept connection or data anymore after receiving this event.
function( code , timeout , completionCallback ) listeners, that DO have a completion callback, have some critical asynchronous tasks to perform before exiting. E.g.: a server that needs to gracefully exit will not accept connection or data anymore, but it still has to wait for client request in progress to be done.

Note that the code and timeout arguments passed to listeners are actual values used by async.exit().

So async.exit() will simply wait for all listeners having a completionCallback to trigger it (or being timed out) before exiting, using process.exit() internally.

process.on( 'asyncExit' , function( code , timeout , callback ) {

    console.log( 'asyncExit event received - starting a short task' ) ;

    setTimeout( function() {
        console.log( 'Short task finished' ) ;
        callback() ;
    } , 100 ) ;
} ) ;

process.on( 'asyncExit' , function( code , timeout ) {

    console.log( 'asyncExit event received - non-critical task' ) ;

    setTimeout( function() {
        console.log( 'Non-critical task finished' ) ;
    } , 200 ) ;
} ) ;

async.exit( 5 , 500 ) ;

After 100ms, it will produce:

asyncExit event received - starting a short task
asyncExit event received - non-critical task
Short task finished

Note how the setTimeout()'s function is not executed in the second event handler: this handler does not accept a callback, hence the process will exit as soon as the first handler is done: after 100ms.

Nice, isn't it?

This is a good day to die...

Now you will never use process.exit() anymore!

Happy coding!

Javascript API Design: Function's Arguments Order

Written 6 years ago July 6th, 2015 by Cédric Ronvel
node.js · javascript · API design

Flight formation

The number one rule is consistency... cause nobody want to use a broken API like PHP... It's very clear and does not need further explanation.

Ok, so let's keep it consistent, but what order should we use anyway?

When designing a Javascript API, an important point one may consider is the binding possibility (i.e. using .bind() on the API methods).

Ask yourself what is the most constant parameter and what is the most variable parameter for a method.

Let's use a real-world example here.

I'm currently working on a new general purpose data validation lib (doormen). The main feature of the lib is a function that takes a schema and a data as arguments, and throws if the data does not validate according to the schema.

As of doormen v0.2.x, the syntax is now doormen( schema , data ):

// do not throw
doormen( { type: 'number' } , 1 ) ;

// throw: 'hello' is not a number
doormen( { type: 'number' } , 'hello' ) ;

But before v0.2.x, the syntax was doormen( data , schema ). What a terrible design mistake I made back then!

As you can imagine, a single schema aims to validate multiple data, hence there will be a greater number of unique data than unique schema. The schema parameter is less variable than the data parameter.

Therefore, as of v0.2.x, one is able to create a specific validator easily using .bind(), e.g.:

var singleDigitNumberValidator = doormen.bind( null , { type: 'integer', min: 0, max: 9 } ) ;

singleDigitNumberValidator( 3 ) ; // OK
singleDigitNumberValidator( 13 ) ; // Not OK
singleDigitNumberValidator( 'bob' ) ; // Not OK
...

After binding, only one argument remains!

Before v0.2.x, slightly more code was needed, and one more function call to achieve the same:

var singleDigitNumberValidator = function( data , schema ) {
    return doormen( data , { type: 'integer', min: 0, max: 9 } ) ;
} ;

singleDigitNumberValidator( 3 ) ; // OK
singleDigitNumberValidator( 13 ) ; // Not OK
singleDigitNumberValidator( 'bob' ) ; // Not OK
...

I think this rule can apply to language that do not have .bind() too, since it follows a “generic to specific” pattern.

Free Orthoptic Stereogram

Written 6 years ago June 29th, 2015 by Cédric Ronvel
stereogram

“That's a cat! I see it!”

Working all day long on a computer can lead to various illness, like headache or migraine. It can also weaken your eyes.

When I work hard many days in a row, my eyes cannot cross correctly anymore and it causes huge headaches. I have to exercise my eyes with stereograms to recover.

Few months ago, I lost the stereogram cards that the orthoptist gave to me. So I try googling for some cards, just to found out that you have to pay for them! It costs between 8$ and 15$ just for a single A4 page PDF. In my opinion, that's a shame: using a vector graphic program (like Inkscape), it only takes few minutes to create those simple shape. I don't see why it shouldn't be free just because it's related to health. Oh I should have said the contrary: because it's related to health, it must be free.

So I have created a Github repository for that. It contains the PNG and the SVG source, and it's licensed under the GPL v3.

Please contribute to the project!

This is my first original stereograms set.

How does it works? Cross your eyes until both images merge into a third one in the middle. Your left eye looks at the right part of the image, while your right eye looks at the left part.

The second and the third stereogram contains 3D effect: some part of the image will appear closer! I suggest you to print that on a thick paper. You can move around the card, to the left, to the right, upward, downward, backward, forward. It trains your eyes to cross.

The first stereogram does not feature 3D. The left and the right part are an incomplete picture of a cat, the merged one is a complete cat. This trains your brain to pay attention to both eyes rather than overusing one over the other.

By the way, you should ask a professionnal orthoptist if those exercises suit you as well. Also there are other important exercises. Every hours you should take a break and look as far as you can. This will relax some muscle of your eyes (those tired by the computer screen) and put at work some others (those not involved by computer screen).

I hope it will help keeping your eyes healthy!

See ya!

From PHP to Node.js - Part. III: Language Comparison

Written 6 years ago June 24th, 2015 by Cédric Ronvel
node.js · node.js vs php

A Duel in the Snow!

Do not miss other articles on this subject:

Now it's time to compare the Javascript language to the PHP language. By the way, I will not talk about various browser's implementation: we will compare apple with apple here, so we will talk about Javascript inside of Node.js.

The PHP language

PHP is probably the most hated language. There are tons of PHP bashing articles in the world. But what exactly is PHP and why the hate?

It's hard to tell exactly what PHP is. PHP looks more or less like a random and extensive compilation of tools. This is probably what make it simultanously great and bad: everything exists in the core language: procedural, class-based OOP with lot of features, tons of functions and methods...

You want to manipulate arrays? There is probably a builtin function doing what you want already: you don't have to code it by yourself. So you can focus on your application business logic.

You like OOP? PHP features almost everything that a class-based programming language has: classes, abstract classes, final classes, inheritance, constructors, destructors, static methods, interfaces, scope resolution, late static binding... except a strong type hinting...

And that's what make people angry about PHP. There is always a "but" or "except" at the end of the phrase.

“I'm the mighty PHP Elephant! Want to write some codeZ?”

All design pattern can be implemented easily in PHP, since its OOP has been inspired by C++ and Java. You can even do more.

Because PHP behaves like a dynamic language too, you have magic methods that allow you to perform some cool but unpopular things. For example there are magic getters and magic setters that are called everytime you try to access a property that does not exist. Since PHP doesn't mix method's and member's namespace in an object, they cannot clash. There is also a magic method called everytime you try to access a method that does not exist.

There are some interesting use cases for this: e.g. one can build a simple xml lib where your object map exactly the XML tree... and there IS in fact a lib called SimpleXML that takes advantage of this. Back in my PHP's day I coded a lib inspired by SimpleXML but without its various gotcha. It was fun and entertaining, and I found the language really flexible. Combo that with the ArrayAccess interface, and the lib was allowing one to navigate easily from an XML node to another, in a concise manner, in a way that cannot be done in C++ or Java.

There is also the traits feature that is useful once in a while. It permits class aggregation. It's useful because PHP does not support multiple inheritance. It's not a bad thing, since multiple inheritance is broken by design... see the diamond problem.

So it's untrue to claim that PHP's OOP is badly designed.

PHP is great at copying and integrating every language features that its creators want. It even integrates goto, a feature that was introduced a very long time after the OOP's introduction: that one caused probably the greatest shitstorm in the entire PHP life! In fact, PHP can shamelessly absorb anything.

So that's turn out to be the main problem with PHP: its lack of identity. It was a procedural language, then becames an OOP language, so more than often, PHP is totally schizophrenic about that. Some core features are sometime functions, thus polluting massively the global namespace, sometime they are behind a class, and sometime they use BOTH! Yay, you read it: PHP core itself does not know who he/she/it is. Example? The date_* function collection and the DateTime class, you can use both DateTime::diff() and date_diff().

Okey, this is not critical, but that's what make PHP the Frankenstein of programming languages. From Wikipedia:

One criticism of PHP is that it was not originally designed, but instead it was developed organically.

Website... wantz... to... live...

That's the story of PHP. There is no consistency anywhere.
You have a function named htmlentities() to encode and html_entity_decode() to decode. Seriously, WTF??? Looks like people there cannot stick to a convention.

There is a core function named strcmp() WITHOUT underscore, and one named str_replace() WITH underscore. Speaking about that one, have a look to its prototype:

str_replace(search, replace, subject [, count ] )

... versus:

str_split(subject [, split_length ] )

Yes, that's another gotcha: the first one want the subject string to be its third argument, the second want it to be its first argument. Ugly. There are plenty of examples of inconsistencies in PHP, and after years of programming with it, it still happens that I forget the correct order for a particular function.

Also PHP does NOT support unicode natively, you have to rely on a very small subset of the string manipulation function prefixed by mb_*. When I started using PHP, most of those functions were missing, and it's still painful to do unicode aware code. This is probably the PHP greatest weakness. Almost all websites are using UTF-8 now.

There was a failed attempt to upgrade PHP's core with unicode, PHP 6, but it was miserably aborted.

Finally, PHP has errors AND exceptions. Silly.

Also I'm not going to list all of PHP's failures here, there are dozens of websites doing that. For example PHP: a fractal of bad design is a really well-documented article.

So there are two kinds of people: people that can cooperate with a weird but capable language, and perfectionist people that will hate it from day one.

A short story of modern Javascript

The Javascript language has its haters too.

However, most of the arguments against it exists because Javascript is the most misunderstood language. For decades, not a single book was accurate or written by someone who has a clue about the language very nature, except Douglas Crockford.

Worst: Internet Explorer totally distorted it for years, and we all know what the harm Microsoft can do when it has decided to totally break any living standard for the sake of its monopoly... Programmers exposed to Javascript back in the IE 5.5 and IE 6 days have been totally traumatized. And I was one of those. By the way, Microsoft's JScript is *NOT* Javascript, but sadly no one cares about that.

Cool story, bro!

Then Javascript had a second live:

JQuery helped to abstract away browser's incompatibilities.
Google Chrome appears, rapidly conquers its share of market, and initiate a browser-war that Microsoft cannot win anymore, leading to the JS engine race we all know: suddenly Javascript becomes lightning fast! Also Internet Explorer was forced to follow the standards, or it would lost even more marketshare.
Mozilla and Google has done a huge work to push the language forward: Google is usually the innovative guy and Mozilla the perfectionist that polish things and improve the spec before its standardization.
Thanks to the V8 speed, Javascript becomes an option for server-side scripting: Node.js appears and the ecosystem is growing up really fast. Node.js enforces the RequireJS spec and the last major gotcha of Javascript is finally gone: global variable leaks. Now Javascript is modular, and is fully capable to handle big projects. Npm is probably the better package manager in the world and solves dependency hell in an elegant way.

The Javascript language

Javascript is a dynamic language, featuring a prototype-based OOP.

It has dynamic typing, so duck typing is really common here. Almost everything in Javascript is an object. Object literal are common too, and more than often they are used to pass named parameter to a function. Object are basically a hashmap, eventually a prototype chain (also useful for inheritance), and since functions are first-class citizens that make them methods. Finally, Javascript has closures.

Javascript is event-driven, so callbacks are used everywhere.

The beauty of Javascript is its simplicity and its expressiveness. With only few keywords, almost any constructs are possible.

Common criticisms

People often criticizes the Javascript object model. Some are claiming that it is a Procedural language, not an OOP language.

They are wrong. Usually, people are confusing OOP with Class-based OOP. Javascript is a Prototype-based OOP language. See the wikipedia page about programming paradigm and about prototype-based programming.

There is a very interesting comment written by @Plynx, explaining the benefits of the Javascript OOP, and the differences between class-based OOP. The comment is very long, detailed and well-written, this quote is just a small part of it:

I should probably say a few words about encapsulation, since it's often cited by object oriented programmers as being a very important aspect of programming.

The ability to hide data and functionality is present in Javascript in the form of closures, but they are rarely used that way, and the history of encapsulation should make it clear why. Earlier I mentioned namespace pollution as one of the key problems in early structured programming languages. Modern taboo against global variables flows in part from the problems programmers had when name collisions would occur after the merge of disparate codebases. Additionally the concept of the object as an API, and the need to subclass in order to extend functionality while preserving type compatibility, led to the creation of the access qualifiers in OOP. Initially it was thought that this method of disallowing unintended use of classes would be a boon to group development, but a number of studies have shown that it did not improve productivity.

The primary benefit of encapsulated classes was the fact that they created namespaces, now a headline feature in class-based languages. Namespaces produce the benefit of not having name collisions while also documenting the intended use of a class, without hamstringing future development by restricting the possibility of unforeseen useful code sharing by prematurely disallowing it. In Javascript, every object being a dynamic associative array makes every {} a namespace (even more name control is possible through module.exports in Node). Namespaces are generally a superior approach to encapsulation, preserving the benefits without dealing with the grueling problems of strict access control to portions of classes in class-based languages.

Phew... Let's have a break...

People previously exposed to class-based OOP will often found Javascript reluctant: no classes, no private members... Javascript is fully capable of that, but you have to do it by yourself: a constructor is just a mere function that can create object, and used in conjunction with prototypes achieve the same thing that classes. Private members are possible through closures. That doesn't means you need to code more to get that, and in fact you have to code less, that just means there is no class or private keyword.

However you can do more, if you want. A prototype is nothing more than a regular object: you can mutate it and all instances created by that prototype are mutated. No class-based language is able do to that at runtime.

Anyway, it's worth noting that there is no need for private members at all, in a dynamic language. Also there is no inherent reason to code a complex class hierarchy, that would lead to over-engineering, lasagna code or worst: spaghetti code. There is no need for such overhead in Javascript.

Good Javascript code embraces the KISS principle. KISS projects advance at a faster pace, they are robust, clean, healthy and easy to maintain.

Also, Javascript has some real gotcha:

arguments is not an Array, and it's boring to type Array.prototype.slice.call( arguments ) everytime you want Array features...
+ operator can mix string and numbers: it performs concatenation or addition.
you can't easily inherit from builtin objects (e.g. Array)
typeof null === 'object' this one really sucks... it's the thing I hate the most about Javascript... The workaround is easy, still, I hate to type if ( myVar && typeof myVar === 'object' ) all days... And it won't be fixed :'(
UCS-2 unicode encoding instead of UTF-16... In fact Javascript exposes characters as UCS-2 with surrogate pairs. E.g. a single chinese unicode character takes 2 characters in a Javascript string.
automatic semicolon insertion: semicolon are not mandatory to end a statement, semicolon can be automatically inserted at the end of the line. A semicolon can be inserted by the parser where it's not intended by the coder, e.g. if the return keyword is followed by a new line.

Also people usually adds NaN to the list, because NaN stands for Not a Number and still typeof NaN === 'number', and because NaN !== NaN. However, there are wrong, actually NaN behaves just like it should. Really. Just read Javascript NaN Demystified if you don't believe me.

Let's compare them!

This thing is SO broken!

Where PHP wins:

PHP is very classic: if you come from C/C++/Java & co, you will find similar concept.
Easiest to grasp: partly in consequence of the previous point, you will be able to code in PHP in no time. You will be able to code in Javascript quickly too, but your code will be crappy even if you have some solid knowledge in other languages. You will have to realize that Javascript does not work the way you are used to, and you will have to change your habits.
PHP has destructors, Javascript doesn't... sometime I'm missing them.

Where Javascript/Node.js wins

Unicode support: UCS-2 is suboptimal, however it's still far better than no support at all.
Isomorphic Javascript: you can share modules between your server-side Node.js codebase and your browser-side codebase. Browserify can help you in the process.
Less boilerplate code.
Faster: consequence of the previous point: it's possible to write application faster, and to achieve great things with a smaller codebase.
Easiest to maintain: code above the average is easiest to maintain in Javascript than in PHP. Also please note that it is possible to code total crap in Javascript, so this does *NOT* apply to code below the average. PHP can be very crappy too, but its paradigm drive the programmer a bit more.
Cleaner core: no global namespace pollution here, the language is rather consistent, and almost every core features are OO.
Expressiveness and flexibility: Javascript is one of the most expressive language, and win this battle hands down.
Javascript is event-driven, dealing with I/O is way faster and easier, internet is more and more about realtime and Node.js is a strong player in that area.
Websocket: consequence of the previous point, Javascript is *THE* language of choice if you want to deal with websocket.
Freedom: you build your own server or micro-service, so everything is possible, you are not tied to the PHP paradigm.
Evolution: the Javascript language evolves quickly and far more openly. New features have been well weighted. Some quirks of the past will remain for the sake of backward compatibility, but at least new additions will be clean. On the contrary PHP is not really open, few peoples have total power on the future of the language and are not listening to criticism at all.

Conclusion

As you can see, Javascript scored more points than PHP. And that's not going to change anytime soon.

In fact PHP is evolving too. Some features were missing and they were added. But not in a sane way. The path of PHP is wrong.

So let's finish the talk with two angular stones of PHP 5 that allow it to be framework-compliant: autoloading and namespace.

When autoloading was included, I was like “Nice! This is really useful and solves a lot of trouble I had in the past”. And that's right: PHP is greater with autoloading than without. However, looking backward, it appears that autoloading was an infamous monstrosity. There was a problem, but it was fixed in the badest possible way. It is really hacky.

It was combined later with namespaces to avoid massive global scope pollution by third-parties. In a way, it opened the road to framework interoperabilities, but since it does not solve the whole thing, the frameworks had to do what PHP should had done by itself. It leads to a consortium of frameworks and those awkward PSR-x specs. But shitty PHP-frameworks will be explored in the next article of this series.

It's worth noting PHP's namespace syntax uses backslashes as separators. That's very unusual and I don't like it.

That does not fix the main issue with PHP: require() and include() (and therefore autoloading) in PHP works the same way the #include preprocessor directive works in C/C++: it's just a kind of runtime copy-paste of file to include into the parent file. Therefore, it is totally leaky by design!

Stop that leak!

Now that I'm a Node.js developper, it's even more evident that PHP was wrong. Node.js solves the dependency hell in the most elegant way: you require a lib (i.e. a module) into a variable. A Node.js module returns (i.e. exports) something into the caller's variable. Usually a module either exports a function or an object containing a collection of functions.

Okey, an example is worth thousand words, let's say we have a function called toto(). Later we found out a wonderful third party lib, but unfortunately its name is "toto" too. No problem, let's load the module into the "moduleToto" variable:

var toto = function() {
    // My code
}

// Load toto into moduleToto
var moduleToto = require( 'toto' ) ;

// Use the third party lib
moduleToto.makeSomethingAwesome() ;

Nothing leaks outside of a module.

No pollution.

Furthermore, in Node.js proper namespaces would be useless because they already exist the Javascript way: you just have to put submodules into properties of the main module.

So simple.

So KISS.

That's why I like Javascript.

From PHP to Node.js - Part. II: Performances

Written 6 years ago June 2nd, 2015 by Cédric Ronvel
node.js · node.js vs php

Ready yourself for the match!

Do not miss other articles on this subject:

Benchmarks: Truth and Lies

First, some raw benchmarks of PHP versus Javascript powered by V8, provided by The Benchmarks Game here:

PHP vs Javascript

According to those benchmarks, the median result give Javascript roughly 10 times faster than PHP.

Also we all know that benchmarks lies, but it's still pretty well known and well accepted that Javascript/V8 is faster than PHP. It's hard to tell how faster Javascript/V8 is for real world application, but expect +20% to +50% of raw speed in favor of it.

So we have a language+engine combination that, at its root, is faster. However we want to build website & webapp, so for this particular task, which one is faster?

Here it will be VERY difficult to come with a meaningful benchmark, because of the PHP paradigm's very nature. PHP's performance are tied with the actual web-server (Apache? NginX? Lighttpd?), how it interacts with it (mod_php? cgi? fast-cgi? php-fpm?), and the opcode cache (none? APC? OPcache?).

You can find benchmark all over the internet comparing PHP & Node.js, but be sure that there are all biased, if not deliberatly lying in favor of the author's language of choice.

You will find that most of time, there are in favor of Node.js... but truth should be told: an "Hello World" benchmark comparing Node.js vs Apache Prefork MPM + mod_php is just bottomless stupidity... And sadly that's what you will get if you google "php nodejs benchmark". At least give PHP a chance to shine!

Hey! If you are one of those awful benchmarks' author, here is a hint about how works a real world PHP stack: first a pool of NginX front server, then a pool of PHP-FPM server in the back-end. So at least for your glorious benchmark, create 2 VM, one with NginX and one with PHP-FPM... Seriously: Apache Prefork MPM *alone* (serving a static "Hello world" HTML) is already slower than Node.js! Do you even know what your benchmarks are measuring?

We should resign about benchmark here and accept that while it's easy to test Javascript/V8 vs PHP/Zend, we will never ever have a meaningful real-world web application benchmark.

Beyond Benchmarks: Some Rationale

Remember: those are just the fastest PHP and JavaScript V8 programs measured on this OS/machine.

-- The benchmark game

But that does not mean that we do not have any clue!

We said the very nature of PHP makes things hard to test, however, this is exactly this way of doing things that leads to a strong belief: the PHP paradigm itself is way slower than the Node.js paradigm.

Fight!

A typical PHP app works this way:

the webserver (Apache or NGinX or Lighttpd) receives a client HTTP request for some kind of dynamic content
the webserver dispatches the request to an idle PHP process
the PHP process loads the cached opcode of the script
the PHP process runs the script
the script runs its startup/init code:
- if a framework is involved, the framework core is loaded and its startup code is run: this is usually the part that hurts performance the most
- the PHP script may load session data attached to the requesting user/client, involving a memcached or a database round-trip
- some application-specific startup may eventually run
the script processes the request, and eventually send content back to the webserver
the webserver forwards the response to the client
the PHP process is recycled, waiting for the next request

A request performed by PHP is more or less like launching a program just for a request, then terminate the program when the client is served, launch again the program for the next request, terminate it, and so on.

The process itself can be recycled thanks to PHP-FPM, and most database drivers offer the persistent connection feature so connection get recycled too: that saves a lot of resources. But ultimately, your script get started and terminated for each single request.

It's a fact that the PHP Zend engine spend way more time executing startup code (mostly because of frameworks) than running the actual business logic of the application.

A typical Node.js app works this way:

optionnal: NginX (used as a load balancer, HTTPS to HTTP bridge, or websocket bridge) receives a request or establish a connection with the client
- then it dispatch the request to one of the Node.js service worker
the Node.js service processes the request or accepts the client connection (by the way all startup code have already been executed when the service had been launched)

A Node.js service can run days!

In fact there is no particular limit, it all depends on the quality of your code.

If your code is full of bug, then probably the service will crash and will be resurrected again by forever every two or three minutes. If your code is clean enough, you can expect it to serve thousands or millions of clients before exiting cleanly.

So the time V8 processes your startup code is really nothing compared to the time it processes the actual business logic of your application.

On the contrary, for each line of your business logic the Zend engine processes, it processes 10, 20, 50 or 100 lines of your framework startup code.

With that in mind, do the math, and you will understand easily why Node.js is so powerful.

The PHP paradigm is defeated.

PHP is rarely the bottleneck?

Some will argue that PHP is rarely the bottleneck.

That's true, PHP, in the PHP's paradigm, is rarely the bottleneck. I mean, Zend engine is fast enough, not as fast as V8, but it is already pretty fast (you may know that PHP is actually faster than Python). However, that's not PHP/Zend itself that is slow, that's the PHP's way of doing things that is not relevant anymore.

The CGI way is doomed... You can evolve it into Fast-CGI/PHP-FPM, that will improve things, but the main issue remains unsolved.

Most probably, if you turn your PHP code into C++, and port all your framework to C++ as well, you will get slighly better performance, but you will not get the best your hardware can deliver anyway.

The truth is that most of the time, your PHP script is waiting for I/O, and I/O are the most time-consuming things. That IS the reasoning behind “PHP is rarely the bottleneck”.

However, it only applies to web application that are I/O bound (which is the case for at least 95% of the web), but if your application is CPU bound PHP will surely become the bottleneck. Web applications that need some complex data processing, Artificial Intelligence (e.g. board games), file-sharing with a complex layer of user access management, file-synchronization... are applications that are CPU-bound. I should mention here that Node.js has hard-time with CPU-bound tasks too, and some strategies should be employed or the event-loop will be drop dead.

However, being the bottleneck or not is one thing, but that's just one aspect of the performance that matters to us.

That suggests that the time to process a single request is not so much tied to the speed of PHP. But what happened when your application is literally besieged by your clients/users? How much time each request will take? How many request will be dropped?

There is indeed at least one bottleneck: the number of PHP process available.

Crowded!

The Power of Event-driven & non-Blocking I/O

Long-lived services capable of handling thousands of concurrent connections taking advantage of a non-blocking evented I/O system are *THE* way to do it now.

When your service is waiting for a response from the database, PHP is just lazily idling, while Node.js will simply accept another concurrent request, start to process it until a new database round-trip is needed... When the database response is available, the request processing is resumed, and so on. That's why a single PHP process can accept one request at a time while a single Node.js process can accept thousands of them.

Creating a process is pretty damn slow, so the PHP strategy is to create a pool of reusable processes.

However each PHP process has its own share of memory, so spawning thousands of them has a cost. Not to mention that CPU context-switching has a cost too.

With Node.js, you can save a lot of CPU. Just spawn as many Node.js services as CPU core available on your system. Evented non-blocking I/O will do the tricks for you: each client's request will trigger your request callback, it's easy and reliable.

Hanging requests waiting for I/O do not take much memory, it costs far less than spawning processes, there is almost no context switching since everything happens into the same Node.js process and it has a dedicated CPU core (remember: we spawn as many Node.js services as available core).

More interesting: your service can happily eat as much resources as available on your system. Your database back-end is engorged? No problem, Node.js can still accept client's request, while PHP would be limited to the number of process of the pool. With Node.js, there is always room for another request. Whenever a request get its precious I/O, your application can serve it to your client.

Processing...

However like mentioned earlier, Node.js can have some trouble to be aware of, when your application has many CPU-bound task.

The event-loop MUST keep spinning fast, that's the key for blazing fast web application. If some CPU-bound task kick in without giving control back to the event-loop, the event-loop is blocked. That's it: other lightweight tasks waiting for their I/O will freeze, until the CPU-bound task ends (or at least give control back to the event-loop). Node.js is single-threaded, the tasks don't run in a parallel fashion, tasks are concurrents.

If you have a CPU-bound task, you should run it on another Node.js process. A pool of worker is what you need.

This article explains how to overcome those issues.

Final Words

There is no real metrics for that, but I think I made enough good points here, so let's cut it out now: Node.js is way faster than PHP.

In my experience, like I said at the begining of the article, you can expect from +20% to +50% of raw performance in favor of Node.js. By raw performance, I mean: code two CLI programs, one using Javascript and the other using PHP... the JS program will beat the PHP program, running from +20% to +50% faster. Let's be fair: Javascript without using asm.js. By the way Javascript taking advantage of asm.js is just in another league: it has 50% of the C speed, you can do 3D games with that.

The C speed, a.k.a. the speed of the light! ;)

But we don't want to code CLI programs or 3D games: that's out of the focus of PHP anyway.

So... for the back-end of a Software as a Service, powering a modern web app? Now that's involving the full PHP stack (Nginx, php-fpm, Memcached/Redis, databases, and eventually a PHP framework), not PHP alone.

In my experience, you can expect Node.js to be 5 times, 10 times or 20 times faster. Even more if the PHP team have chosen one of the full-featured frameworks (e.g. Zend or Symfony2) over a micro-framework. Even more if they have chosen one of the slowest framework (e.g. CakePHP, etc).

One may argue that frameworks should not be taken into consideration. However there are some facts: almost any PHP projects uses a framework. Frameworks are part of the PHP ecosystem, they are generally slow, and because of the PHP paradigm, a PHP framework has more impact on performance than a Node.js framework (mainly because of startup code).

Oh and just don't speak about CMS or blog... Can we even compare Wordpress to Ghost? Yay, Wordpress is the slowest piece of software ever made, I can't blame PHP for that.

Finally PHP can overcome its greatest weakness, I will discuss another day what the PHP of the futur should be, if it want to survive in the long run. But I keep that for one of the next articles on this subject!

Stay tuned!

Soulserv Team

Web Jutsu & Start Up