Fast rendering in AIR 3.0 (iOS & Android)
While we await the release of Stage3D, alot of developers seem to be under the impression that you can not do performant games on mobile devices with AIR 3.0.
The truth is, with a little help, the power of the GPU is already unlocked for us!
In this write-up, I’ll show you how to increase rendering performance in AIR by 5x over the normal display list. The end result is a rendering engine which is faster that iOS5′s hardware accelerated canvas! (more on this in a follow up post…)
The Adobe Engineers have done a fantastic job of re-architecting the GPU Render Mode in AIR 3.0. With a little bit of work on the developer’s end we can coax amazing performance out of these little devices. So, how does it work?
The truth is, with a little help, the power of the GPU is already unlocked for us!
In this write-up, I’ll show you how to increase rendering performance in AIR by 5x over the normal display list. The end result is a rendering engine which is faster that iOS5′s hardware accelerated canvas! (more on this in a follow up post…)
Say What?
The Adobe Engineers have done a fantastic job of re-architecting the GPU Render Mode in AIR 3.0. With a little bit of work on the developer’s end we can coax amazing performance out of these little devices. So, how does it work?- Set <renderMode>gpu</renderMode> inside your -app.xml file.
- Use Bitmap’s to render your content
- Cache bitmapData across your classes/application.
That’s almost too simple right? But it works, I promise :p The Bitmap() class is just insanely optimized right now, Kudos to the entire AIR 3.0 team for such an amazing step forward.
So, to boil it down: Use the bitmapData.draw() API to manually cache your displayObjects, save that bitmapData object in a shared cache or static property, and then render a bitmap in place of your displayObject. Essentially, you are writing your own cacheAsBitmap function, but one using a shared cache.
Lets get to the good stuff first shall we? Charts!
We’ve run a stress test across a variety of devices, and the results are below. Each tests consists of one shared texture, with rotation, alpha and scale, the test continues to add sprites while trying to maintain 30fps. We compare standard CPU Mode rendering, with the same test in GPU Mode.\
You can view an HTML version of the test here, so you know what I’m talking about: http://esdot.ca/examples/canvasTests/
You can clearly see the massive difference in performance. Also, it’s important to note that the CPU Tests are fairly well optimized, as we’re still using bitmaps with shared bitmapData properties which is a good optimization technique. This is not some contrived example to make the CPU renderer seem poor.
[Update] Someone asked about copyPixels. CopyPixel’s will be somewhere in between the two tests, it’s faster than displayList, but slower (and considerably less flexible) than using the shared bitmapData technique. As new devices come out, with higher resolution display’s, copyPixels will fall further and further behind (see comments for more details).
Ok, enough talk, let’s see some code!
For example 1, lets say I have a SnowballAsset class in an FLA, exported for actionscript, it’s a nice vector snowball in a movieClip, and I want to render it really fast in GPU Mode.
Now I can simply spawn as many SnowBall()’s as I need, and they will render with what is essentially full GPU acceleration. Note that on this simple example, your assets must have a internal position of 0,0 in order for this to work properly (but you can live with that for a 5x increase in speed…right? Or just add a few lines of code to figure it out…)
In this next example, we’ll make a similar class, but this one is re-usable, you just pass in the class name of the asset you want to use. Also, sometime’s you want to maintain the ability to scale your vector, and have it still look good. This can be achieved easily by oversampling the texture, before it’s uploaded to the GPU.
Now I just create as many instances of this as I want, the first time I instanciate an asset type, there will be a draw hit, and an upload hit as it’s sent to the GPU. After that these babies are almost free! I can also scale this up to 2x without seeing any degredation of quality, did I mention that scale, rotation and alpha are almost free as well!?
With this class all assets of the same type will use a shared texture, it remains cached on the GPU, and all runs smooth as silk! It really is that easy.
It’s pretty trivial to take this technique, and apply it to a SpriteSheet, or MovieClip with multiple frames. Any Actionscripter worth his salt should have no problem there, but I’ll post up some helper classes in a follow up post.
*Note: It’s tempting to subclass Bitmap directly, and remove one extra layer from the display list, but I’ve found it’s beneficial to wrap it in a Sprite. Sprite has a mouseEnabled property for one thing, which a Bitmap does not (not sure why…), and using a Sprite is what allows you to easily oversample the texture, without the parent knowing or caring about it.
The beauty of this method is that you keep the power if the displayList, you can nest items, scale, rotate, fade all with extremely fast performance. This allows you to easily scale your apps and games to fit various screen dimensions and sizes, using standard as3 layout logic.
We’ve released our finished game which is built on this rendering technique, so try it out and see what you think! The lite version is totally free.
The game includes many spritesheet’s (4 enemy types, 3 ammo types, multiple explosions etc). It uses tons of scaling and alpha overlay, and runs really well on everything from a Nexus One to iPad 2.
iPhone / iPad: http://itunes.apple.com/us/app/snowbomber-lite/id478654005?ls=1&mt=8
Android: https://market.android.com/details?id=air.ca.esdot.SnowBomber Lite
Amazon Fire: http://www.amazon.com/esDot-Development-Studio-SnowBomber-Lite/dp/B0069D2DOI/
So, to boil it down: Use the bitmapData.draw() API to manually cache your displayObjects, save that bitmapData object in a shared cache or static property, and then render a bitmap in place of your displayObject. Essentially, you are writing your own cacheAsBitmap function, but one using a shared cache.
Benchmarks
Lets get to the good stuff first shall we? Charts!We’ve run a stress test across a variety of devices, and the results are below. Each tests consists of one shared texture, with rotation, alpha and scale, the test continues to add sprites while trying to maintain 30fps. We compare standard CPU Mode rendering, with the same test in GPU Mode.\
You can view an HTML version of the test here, so you know what I’m talking about: http://esdot.ca/examples/canvasTests/
You can clearly see the massive difference in performance. Also, it’s important to note that the CPU Tests are fairly well optimized, as we’re still using bitmaps with shared bitmapData properties which is a good optimization technique. This is not some contrived example to make the CPU renderer seem poor.
[Update] Someone asked about copyPixels. CopyPixel’s will be somewhere in between the two tests, it’s faster than displayList, but slower (and considerably less flexible) than using the shared bitmapData technique. As new devices come out, with higher resolution display’s, copyPixels will fall further and further behind (see comments for more details).
Example Code
Ok, enough talk, let’s see some code!For example 1, lets say I have a SnowballAsset class in an FLA, exported for actionscript, it’s a nice vector snowball in a movieClip, and I want to render it really fast in GPU Mode.
public class SnowBall extends Sprite
{
//Declare a static data property, all instances of this class can share this.
protected static var data:BitmapData;
public var clip:Bitmap;
public function SnowBall(){
if(!data){
var sprite:Sprite = new SnowBallAsset();
data = new BitmapData(sprite.width, sprite.height, true, 0x0);
data.draw(sprite, null, null, null, null, true);
}
clip = new Bitmap(data, "auto", true);
addChild(clip);
//Optimize mouse children
mouseChildren = false;
}
}
In this next example, we’ll make a similar class, but this one is re-usable, you just pass in the class name of the asset you want to use. Also, sometime’s you want to maintain the ability to scale your vector, and have it still look good. This can be achieved easily by oversampling the texture, before it’s uploaded to the GPU.
public class CachedSprite extends Sprite
{
//Declare a static data cache
protected static var cachedData:Object = {};
public var clip:Bitmap;
public function CachedSprite(asset:Class, scale:int = 2){
//Check the cache to see if we've already cached this asset
var data:BitmapData = cachedData[getQualifiedClassName(asset)];
if(!data){
var instance:Sprite = new asset();
//Optionally, use a matrix to up-scale the vector asset,
//this way you can increase scale later and it still looks good.
var m:Matrix = new Matrix();
m.scale(scale, scale);
data = new BitmapData(instance.width, instance.height, true, 0x0);
data.draw(instance, m, null, null, null, true);
cachedData[getQualifiedClassName(asset)] = data;
}
clip = new Bitmap(data, "auto", true);
//Use the bitmap class to inversely scale, so the asset still
//appear to be it's normal size
clip.scaleX = clip.scaleY = 1/scale;
addChild(clip);
//Optimize mouse children
mouseChildren = false;
}
}
With this class all assets of the same type will use a shared texture, it remains cached on the GPU, and all runs smooth as silk! It really is that easy.
It’s pretty trivial to take this technique, and apply it to a SpriteSheet, or MovieClip with multiple frames. Any Actionscripter worth his salt should have no problem there, but I’ll post up some helper classes in a follow up post.
*Note: It’s tempting to subclass Bitmap directly, and remove one extra layer from the display list, but I’ve found it’s beneficial to wrap it in a Sprite. Sprite has a mouseEnabled property for one thing, which a Bitmap does not (not sure why…), and using a Sprite is what allows you to easily oversample the texture, without the parent knowing or caring about it.
Detail details…
So what’s really happening under the hood?
- With renderMode=GPU: when a bitmapData object is renderered, that bitmapData is uploaded to the GPU as a texture
- As long as you keep the bitmapData object in memory, the texture remains stashed on the GPU (this right here is the magic sauce)
- With the texture stashed on the GPU, you get a rendering boost of 3x – 5x! (depending on the GPU)
- Scale, Alpha, Rotation etc are all extremely cheap
The gotcha’s
Now, there are some caveats to be aware of, GPU Mode does have a few quirks:
- You should do your best to keep your display list simple, reduce nesting wherever possible.
- Avoid blendModes completely, they won’t work (If you need a blendMode, just blend your displayObject first, use draw() to cache it, and render the cache in a bitmap)
- Same goes for filter’s. If you need to use a filter, just applyFilter() on the bitmapData itself, or apply it to a displayObject first, and draw() it.
In our next post, we’ll do some more comparisons, this time with animated spritesheet’s. We’ll also post as a small class we use to rip SpriteSheet’s on the fly.
Update: The post on spritesheet’s is up: http://esdot.ca/site/2012/fast-rendering-in-air-cached-spritesheets
Update: The post on spritesheet’s is up: http://esdot.ca/site/2012/fast-rendering-in-air-cached-spritesheets
[Update]
We’ve released our finished game which is built on this rendering technique, so try it out and see what you think! The lite version is totally free.The game includes many spritesheet’s (4 enemy types, 3 ammo types, multiple explosions etc). It uses tons of scaling and alpha overlay, and runs really well on everything from a Nexus One to iPad 2.
iPhone / iPad: http://itunes.apple.com/us/app/snowbomber-lite/id4
Android: https://market.android.com/details?id=air.ca.esdot
Amazon Fire: http://www.amazon.com/esDot-