2012년 1월 8일 일요일

Fast rendering in AIR 3.0 (iOS & Android)

http://esdot.ca/site/2011/fast-rendering-in-air-3-0-ios-android

Fast rendering in AIR 3.0 (iOS & Android)

While we await the release of Stage3D, alot of developers seem to be under the impression that you can not do performant games on mobile devices with AIR 3.0.
The truth is, with a little help, the power of the GPU is already unlocked for us!
In this write-up, I’ll show you how to increase rendering performance in AIR by 5x over the normal display list. The end result is a rendering engine which is faster that iOS5′s hardware accelerated canvas! (more on this in a follow up post…)

Say What?

The Adobe Engineers have done a fantastic job of re-architecting the GPU Render Mode in AIR 3.0. With a little bit of work on the developer’s end we can coax amazing performance out of these little devices. So, how does it work?
  • Set <renderMode>gpu</renderMode> inside your -app.xml file.
  • Use Bitmap’s to render your content
  • Cache bitmapData across your classes/application.
That’s almost too simple right? But it works, I promise :p The Bitmap() class is just insanely optimized right now, Kudos to the entire AIR 3.0 team for such an amazing step forward.
So, to boil it down: Use the bitmapData.draw() API to manually cache your displayObjects, save that bitmapData object in a shared cache or static property, and then render a bitmap in place of your displayObject. Essentially, you are writing your own cacheAsBitmap function, but one using a shared cache.

Benchmarks

Lets get to the good stuff first shall we? Charts!
We’ve run a stress test across a variety of devices, and the results are below. Each tests consists of one shared texture, with rotation, alpha and scale, the test continues to add sprites while trying to maintain 30fps. We compare standard CPU Mode rendering, with the same test in GPU Mode.\
You can view an HTML version of the test here, so you know what I’m talking about: http://esdot.ca/examples/canvasTests/

You can clearly see the massive difference in performance. Also, it’s important to note that the CPU Tests are fairly well optimized, as we’re still using bitmaps with shared bitmapData properties which is a good optimization technique. This is not some contrived example to make the CPU renderer seem poor.
[Update] Someone asked about copyPixels. CopyPixel’s will be somewhere in between the two tests, it’s faster than displayList, but slower (and considerably less flexible) than using the shared bitmapData technique. As new devices come out, with higher resolution display’s, copyPixels will fall further and further behind (see comments for more details).

Example Code

Ok, enough talk, let’s see some code!
For example 1, lets say I have  a SnowballAsset class in an FLA, exported for actionscript, it’s a nice vector snowball in a movieClip, and I want to render it really fast in GPU Mode.
public class SnowBall extends Sprite
{
//Declare a static data property, all instances of this class can share this.
protected static var data:BitmapData;
public var clip:Bitmap;
 
public function SnowBall(){
 if(!data){
  var sprite:Sprite = new SnowBallAsset();
  data = new BitmapData(sprite.width, sprite.height, true, 0x0);
  data.draw(sprite, null, null, null, null, true);
 }
 clip = new Bitmap(data, "auto", true);
 addChild(clip);
 //Optimize mouse children
 mouseChildren = false;
}
}
Now I can simply spawn as many SnowBall()’s as I need, and they will render with what is essentially full GPU acceleration. Note that on this simple example, your assets must have a internal position of 0,0 in order for this to work properly (but you can live with that for a 5x increase in speed…right? Or just add a few lines of code to figure it out…)
In this next example, we’ll make a similar class, but this one is re-usable, you just pass in the class name of the asset you want to use. Also, sometime’s you want to maintain the ability to scale your vector, and have it still look good. This can be achieved easily by oversampling the texture, before it’s uploaded to the GPU.
public class CachedSprite extends Sprite
{
//Declare a static data cache
protected static var cachedData:Object = {};
public var clip:Bitmap;
 
public function CachedSprite(asset:Class, scale:int = 2){
 //Check the cache to see if we've already cached this asset
 var data:BitmapData = cachedData[getQualifiedClassName(asset)];
 if(!data){
  var instance:Sprite = new asset();
  //Optionally, use a matrix to up-scale the vector asset,
  //this way you can increase scale later and it still looks good.
  var m:Matrix = new Matrix();
  m.scale(scale, scale);
  data = new BitmapData(instance.width, instance.height, true, 0x0);
  data.draw(instance, m, null, null, null, true);
  cachedData[getQualifiedClassName(asset)] = data;
 }
 clip = new Bitmap(data, "auto", true);
 //Use the bitmap class to inversely scale, so the asset still
 //appear to be it's normal size
 clip.scaleX = clip.scaleY = 1/scale;
 addChild(clip);
 //Optimize mouse children
 mouseChildren = false;
}
}
Now I just create as many instances of this as I want, the first time I instanciate an asset type, there will be a draw hit, and an upload hit as it’s sent to the GPU. After that these babies are almost free! I can also scale this up to 2x without seeing any degredation of quality, did I mention that scale, rotation and alpha are almost free as well!?
With this class all assets of the same type will use a shared texture, it remains cached on the GPU, and all runs smooth as silk! It really is that easy.
It’s pretty trivial to take this technique, and apply it to a SpriteSheet, or MovieClip with multiple frames. Any Actionscripter worth his salt should have no problem there, but I’ll post up some helper classes in a follow up post.
*Note: It’s tempting to subclass Bitmap directly, and remove one extra layer from the display list, but I’ve found it’s beneficial to wrap it in a Sprite. Sprite has a mouseEnabled property for one thing, which a Bitmap does not (not sure why…), and using a Sprite is what allows you to easily oversample the texture, without the parent knowing or caring about it.

Detail details…

So what’s really happening under the hood?
  • With renderMode=GPU: when a bitmapData object is renderered, that bitmapData is uploaded to the GPU as a texture
  • As long as you keep the bitmapData object in memory, the texture remains stashed on the GPU (this right here is the magic sauce)
  • With the texture stashed on the GPU, you get a rendering boost of 3x – 5x! (depending on the GPU)
  • Scale, Alpha, Rotation etc are all extremely cheap
The beauty of this method is that you keep the power if the displayList, you can nest items, scale, rotate, fade all with extremely fast performance. This allows you to easily scale your apps and games to fit various screen dimensions and sizes, using standard as3 layout  logic.

The gotcha’s

Now, there are some caveats to be aware of, GPU Mode does have a few quirks:
  • You should do your best to keep your display list simple, reduce nesting wherever possible.
  • Avoid blendModes completely, they won’t work (If you need a blendMode, just blend your displayObject first, use draw() to cache it, and render the cache in a bitmap)
  • Same goes for filter’s. If you need to use a filter, just applyFilter() on the bitmapData itself, or apply it to a displayObject first, and draw() it.
In our next post, we’ll do some more comparisons,  this time with animated spritesheet’s. We’ll also post as a small class we use to rip SpriteSheet’s on the fly.
Update: The post on spritesheet’s is up:
http://esdot.ca/site/2012/fast-rendering-in-air-cached-spritesheets

[Update]

We’ve released our finished game which is built on this rendering technique, so try it out and see what you think! The lite version is totally free.
The game includes many spritesheet’s (4 enemy types, 3 ammo types, multiple explosions etc). It uses tons of scaling and alpha overlay, and runs really well on everything from a Nexus One to iPad 2.
iPhone / iPad: http://itunes.apple.com/us/app/snowbomber-lite/id478654005?ls=1&mt=8
Android: https://market.android.com/details?id=air.ca.esdot.SnowBomberLite
Amazon Fire: http://www.amazon.com/esDot-Development-Studio-SnowBomber-Lite/dp/B0069D2DOI/

Fast Rendering in AIR: Cached SpriteSheet’s

http://esdot.ca/site/2012/fast-rendering-in-air-cached-spritesheets

Fast Rendering in AIR: Cached SpriteSheet’s

In a previous post I showed how proper use of AIR 3.0′s GPU RenderMode can boost your frameRate by 500% on mobile devices. Here we’ll look at how you can do the same thing, and get even bigger gains with your MovieClip animations (like 4000% faster! Seriously, that’s a real number…)
Now, the basic premise of the previous tutorial was to use a single bitmapData instance, for each type of Sprite. We’d cache the sprite to a Static bitmapData property, and then use that to render our Bitmap()’s later.
The difference now, is that instead of shareing a single bitmapData, we’re going to share an array of bitmapData’s. Let that sink in, read it again.  Ok.  And we’re also gonna cache frameLabels and numFrames so we can have some gotoAndPlay() action :)
There’s like a million ways you can do this in flash, here’s just one…
 

Step 1: MovieClip’s to SpriteSheet’s

The first step involves determining which of your assets will need to become SpriteSheet’s. Anything that is repeated many times, or is rendered constantly on screen, should be made into a SpriteSheet. For items that are only displayed briefly, and only a single instance of them occurs, you can just let the normal Flash rendering engine do it’s job. This is one of the beautiful things about gpu render mode, not everything needs to be cached, you can cheat alot(ie straight embed library animations), as long as you optimize what’s important.
Note: Transforms are extremely cheap on the displayList with this method. So, if you’re just scaling, rotating, or moving, don’t make a spriteSheet for it, just Tween it instead, it’s only a little slower, and you save a ton of memory on the gpu.
Once you’ve decided which Animations you want to accelerate, you’ll need the export them as a PNG Sequence. We’ll use Zoe from gskinner.com to help. Zoe will take a swf, and convert each frame to a png, it will also inspect the timeline for any labels, and save all the data in a JSON file.
The steps to do so are as follows:
  • Take your animation, and move it into it’s own FLA. Save the fla somewhere in your assets directory, and export the SWF.
  • Download and install Zoe: http://easeljs.com/zoe.html
  • Within Zoe, open the SWF you just exported, Zoe should auto-detect the bounds.  Click “Export”.
If everything went smoothly, you now have a JSON and a PNG file within your assets directory. On to step 2!

Step 2: Playback the SpriteSheet’s in Flash, really really fast.

The next step is to load the JSON and PNG Files into flash, and play them back. And, we want to make sure that all instances of a specific animation, share the same spriteSheet in  memory, this is what will give us full GPU acceleration.
Including the JSON and Bitmap’s is simple:
[Embed("assets/Animation.png")]
public var AnimationImage:Class;
 
[Embed("assets/Animation.json", mimeType="application/octet-stream")]
public var AnimationData:Class;
Next you need a class to take these objects, and figure out how to play them. This is essentially just a matter of analyzing the JSON file from Zoe, and cutting out the big bitmapData into small bitmapData’s. You also need to devise an API to play those frames, swapping the bitmapData each frame, and respecting your basic movieclip api’s.
I wrote a simple class to aid in this called SpriteSheetClip.
//Just pass in the data from zoe...
var mc:SpriteSheetClip = new SpriteSheetClip(AnimationImage, AnimationData);
mc.gotoAndPlay("someLabel");
addChild(mc);
 
//For max performance, all cached sprites must be manually tickes
function onEnterFrame(event:Event):void {
      cachedAnimation.step();
}
SpriteSheetClip directly extends Bitmap, and emulates the movieClip API. Without going over the entire class, the core code here is the caching and ripping of the SpriteSheets that are passed in. Notice how I use the JSON data to get frameWidth and frameHeight, and getQualifiedClassname for my unique identifier, after that it’s a simple loop:
public static var frameCacheByAsset:Object = {};
 
public function SpriteSheetClip(bitmapAsset:Class, jsonAsset:Class){
 
_currentStartFrame = 1;
var assetName:String = getQualifiedClassName(bitmapAsset);
//Check cache, if cached, do nothing
if(frameCacheByAsset[assetName]){
 frameCache = frameCacheByAsset[assetName].frames;
 frameLabels = frameCacheByAsset[assetName].labels;
 
 _frameWidth = frameCache[0].width;
 _frameHeight = frameCache[0].height;
}
//If not cached, rip frames from bitmapData and grab json
else {
 //rip clip!
 var data:Object = JSON.parse(new jsonAsset().toString());
 var bitmap:Bitmap = new bitmapAsset();
 var spriteSheet:BitmapData = bitmap.bitmapData;
 
 _frameWidth = data.frames.width;
 _frameHeight = data.frames.height;
 
 frameLabels = data.animations;
 
 var cols:int = spriteSheet.width/_frameWidth|0;
 var rows:int = spriteSheet.height/_frameHeight|0;
 var p:Point = new Point();
 
 var l:int = cols * rows;
 frameCache = [];
 
 _currentStartFrame = 1;
 
 var scale:Number = drawScale;
 var m:Matrix = new Matrix();
 
 //Loop through all frames...
 for(var i:int = 0; i &lt; l; i++){
  var col:int = i%cols;
  var row:int = i/cols|0;
 
  m.identity(); //Reset matrix
  m.tx = -_frameWidth * col;
  m.ty = -_frameHeight * row;
  m.scale(scale, scale);
  //Draw one frame and cache it
  var bmpData:BitmapData = new BitmapData(_frameWidth * scale, _frameHeight * scale, true, 0x0);
  bmpData.draw(spriteSheet, m, null, null, null, true);
  frameCache[i] = bmpData;
 }
 
 _currentEndFrame = i;
 numFrames = _currentEndFrame;
 
 _frameWidth *= scale;
 _frameHeight *= scale;
 
 //Cache frameData
 frameCacheByAsset[assetName] = {
  frames: frameCache, //Cache bitmapData's
  labels: frameLabels //Cache frameLabels
 };
}
//Show frame 1
this.bitmapData = frameCache[_currentStartFrame-1];
 
}
Now, using this class, we can make multiple copies of the same Animation, and run them extremely cheaply. You can run 100′s of animations, even on the oldest of Android Devices. On newer devices like iPad 2 or Galaxy Nexus you can push upwards of 500-800 animations at once. Plus scaling, alpha and rotation are all very cheap.
You probably noticed in the code, but for performance reasons, my class will not update itself, it must be manually stepped! Rather than have a bunch of enterFrame listeners, I put the responsibility of the parent class to call step() on all it’s running children, so a single enter frame handler instead of hundreds.
There’s a bit more to the class in terms of managing frames, so feel free to check it out in the attached source project. Be warned though, it’s a little buggy…. I consider this a sample implementation rather than production code, but do as you will.
Next up let’s run some benchmarks, and see how many of these we can push…

Benchmarks!

In this benchmark I will add as many Animations’s as possible while maintaining 30 fps.
I couldn’t get a good shot of it running on device, so here’s a boring video of what the benchmark looks like on PC

2012년 1월 1일 일요일

putty에서 인증 과정 ppk로 대체하기

 
putty에서 비밀번호 입력 과정을 ppk로 처리하는 방법 입니다.
-------------------------------------------------
원래 ssh는 private_key와 public_key인증 및 ssh_agent를
이용하여 서버에 패스워드 인증없이 접속이 가능하다.
Putty도 SSH1에서는 이기능을 사용할 수 있는 것으로 알려졌다.
하지만, 보안상의 이유로 SSH1 키 인증은 사용을 꺼려지고 있다.
Putty나름대로... SSH2에서도 PuttyGen을 이용하여 Private키를 생성하여
접속을 가능하게 하여고 했으나.. 아직까지는 개발이 진행되지는 않았다.
Putty 0.53b의 Puttygen은 openssh에서 생성된 SSH2 private_key를
Putty 고유의 키로 변환하는 기능을 제공하는데..
이방식을 이용하면. SSH2로 Putty도 인증없이 접속이 가능하다!!!
1. Private_key 생성하기.
Putty로 일단 접속하고자 하는 서버에 접속을 한다.
그리고 다음과 같이 키를 생성한다.
1
2
3
4
5
6
7
8
9
10
[admin@ns admin]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/admin/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/admin/.ssh/id_rsa.
Your public key has been saved in /home/admin/.ssh/id_rsa.pub.
The key fingerprint is:
ff:a5:10:ad:c8:7a:4f:40:42:69:df:c3:00:d3:a3:5b admin@ns.foobar.net
[admin@ns admin]$

이 때 암호는 입력하지 않아도 된다. 나중에 따로 지정할 수 있기 때문이다.
자 생성된 Public Key를 authorized_keys로 옮기고..서버가 키로 접속이 가능한지 테스트 해본다.

1
2
3
4
5
6
7
8
[admin@ns admin]$ mv .ssh/id_rsa.pub .ssh/authorized_keys
[admin@ns admin]$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 89:79:86:1b:cb:fc:a0:05:9c:65:88:b5:4c:1b:7f:c8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Sun Jan 11 00:43:26 2004 from 192.168.0.25
[admin@ns admin]$

자 다음과 같이 패스워드를 묻지 않고 접속이 가능하다면..일단계 성공이다. 만약... 위의 방법으로 접속이 실패했다면...
sshd가 인증키로 인증을 허용하지 않기 때문이다.
이럴 경우.
sshd_config(보통은 /etc/ssh/sshd_config)에 다음 두줄이 포함되어 있는지 확인하자.

1
2
PubkeyAuthentication yes
AuthorizedKeysFile      .ssh/authorized_keys

주석처리되어 있으면 주석을 풀어주고, 없으면 추가해 놓은 후...서버를 재시작하고.. 다시 위의 방법으로 시도해본다.
이부분에 관련된 자세한 내용은 아래의 링크를 참조해보라.인증키 사용하기
2. Key를 가져와 Putty의 PPK로 변환하기..
자 psftp나 다른 ssh2 sftp가 지원되는 클라이언트로 생성한 private_key를 가져온다. 여기서는 putty와 함께 포함된 psftp를 사용했다.

1
2
3
4
5
6
7
8
C:\Program Files\HangulPuTTY>psftp admin@ns.foobar.net
사용자 이름 "admin"으로 시도합니다.
admin@ns.foobar.net 의 비밀번호:
Remote working directory is /home/admin
psftp> get ./.ssh/id_rsa
remote:/home/admin/.ssh/id_rsa => local:id_rsa
psftp>exit
C:\Program Files\HangulPuTTY>

이젠 puttygen을 이용하여 putty의 고유 개인키 포멧인 ppk로 변환할 차례이다. puttygen을 실행하면... 오른쪽 하단부분에 load라는 버튼이 보일 것이다.
그걸 클릭한 후.. 받아온 id_rsa파일을 열자. 파일형식이 ppk로 지정되어 있음으로... 모든 파일보기로 고쳐야 읽을 수 있을 것이다.
그럼 Succe... convert save어쩌고 하는 메세지 박스를 볼 수 있는데..
내용은 성공적으로 openssh 개인키를 가져오는 데 성공했고..
이 키를 사용하려면... 다시 ppk로 저장해야 한다는 내용을 설명하는 것이다.
자.. 이제.. load 및에 있는 save private key를 눌러...
Putty Private key=ppk로 저장한다.
이때 패스워드를 넣지 않으면.. 경고가 뜨는데..
개인의 취향대로 한다. 넣어둘 경우... pagent를 이용하면..
나중에 역시 패스워드 없이 접속이 가능해진다.
3. putty 설정..
여기서는 iputty(한글 Putty)를 기준으로 한다. 영어와 한글의 차이일뿐
100% 동일하리라 믿는다.
일단 putty를 실행시킨후,
호스트 이름과 저장된 세션에 적절한 내용을 입력한다.(물론 프로토콜은 ssh로 해야한다.!!!)
왼쪽의 하단 부분에 "접속" 을 클릭한다음.. 계정명을 입력한다.
입력했으면 "접속" 및의 SSH->인증을 클릭하고...
인증키 파일에 전에 생성한 ppk파일 경로를 지정한다.
다시 세션을 클릭한후.. 저장을 눌러 세션을 저장한다.
자 대망의 Password 인증없이 접속할 차례다.
떨리는 마음으로 열기을 눌러 보자!!!
약간의 지연과 함께... 다음과 같은 메세지가 나오면 성공한 것이다!!!

1
2
3
4
사용자 이름 "admin"으로 시도합니다.
에이전트로 인증되었습니다: 공개 키 "imported-openssh-key"
Last login: Tue Jan 13 03:05:24 2004 from 192.168.0.25
[admin@ns admin]$

만약 ... ppk에 암호를 지정했던 사람들은 암호를 물어 볼것이다.
그럴 경우 원 암호가 아니라, ppk에 지정된 암호를 입력하면...된다.
4. pagent를 이용하기.
이 부분은 ppk에 암호를 지정한 사람들에게만 해당된다.
pagent를 실헹하면 오른쪽 트레이에 모자를 쓴 putty의 아이콘이 등록된다.
오른쪽 마우스클릭하면.. addkey라는 것이 보일 것이다.
이를 클릭하면 키를 지정할 수 있는 창이 열린다.
해당 키를 지정하면.. 키의 암호를 묻는데...
이때 PPK의 암호를 입력한다.
그 다음 putty로 해당 세션으로 접속을 시도하면...
더이상 암호를 묻지 않는다.
자 putty로 서버를 관리하던 많은 사람들이여...
이제 보다 편리하게 서버를 관리하자!!..
끝으로 한글 putty를 개발하고 계신 perky님께 감사의 말씀을 드리면서...
ps..
추가로.. 아까 생성한 id_rsa.pub = authorized_keys도
재활용이 가능하다. 일단 psftp등으로 로컬로 복사받은 다음...
매 서버마다 위의 과정으로 매번 키를 생성하지 말고..
원격 접속이 필요한 서버에 pscp를 이용하여 복사해 넣으면 된다.
1
C:\Program Files\HangulPuTTY>pscp authorized_keys admin@anotherhost:.ssh/authorized_keys

아니면.. 생성된 호스트에서..
1
[admin@ns admin]$scp ~/.ssh/authorized_keys admin@anotherhost:.ssh/authorized_keys

ps2
접속지연은 접속자의 호스트네임을 채크하는 것 때문에 그렇다.
접속자의 호스트네임 및 IP를 /etc/hosts 파일에 등록하면..
지연속도를 줄일 수 있다.