Threading Full Auto Physics.ppt
《Threading Full Auto Physics.ppt》由会员分享,可在线阅读,更多相关《Threading Full Auto Physics.ppt(130页珍藏版)》请在麦多课文档分享上搜索。
1、Threading Full Auto Physics,David Wu Director of Technology Pseudo Interactive,Agenda,About Full Auto About Pseudo Review of Threading Techniques Full Auto Threading Strategy Full Auto Threading Reality What went right, what went wrong Random Notes,Full Auto,Released on Feb 14 2006 for the Xbox 360,
2、Full Auto,Design Vision: “The Most destructive racing game ever” My Vision: “Physics running on multiple cores” Which I later updated to: “Physics running on multiple cores that has few race conditions and doesnt crash too often” Marketing preferred the Design Vision.,About Pseudo Interactive,Pseudo
3、 Interactive is a team of great game developers.,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render,Pipeline,Similar to CPU/GPU parallelism,T=4,T=3,CPU Game Frame 3,GPU Render Frame 2,CPU Game Frame 4,GPU Render Frame 3,T=5,CPU Game Frame 5,GPU Ren
4、der Frame 4,T=4,T=3,Pipeline,Thread 0 collision detection Frame 3,Thread 1 Logic/AI Frame 2,Thread 2 Integration Frame 1,Thread 0 collision detection Frame 4,Thread 1 Logic/AI Frame 3,Thread 2 Integration Frame 2,T=5,Thread 0 collision detection Frame 5,Thread 1 Logic/AI Frame 4,Thread 2 Integration
5、 Frame 3,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render Forking Divide data and solve on multiple threads at once I.e. Collision detection on independent objects,Forking Data Parallelism,Perform the same task on multiple independent objects in
6、parallel. Thread “forks” into multiple threads across multiple processors All threads repeatedly grab pending objects indiscriminately and execute the task on them When finished, threads combine back into the original thread.,Forking,Object A Thread 2,Object B Thread 0,Object C Thread 1,Fork,combine
7、,5-Mar-05,Types of Parallelism,Pipelined Stages Discrete stages that feed into one another I.e. Physics - Render Forking Divide data and solve on multiple threads at once I.e. Collision detection on independent objects Jobs Spin off tasks to be complete asynchronously Lower priority tasks not in the
8、 critical path I.e. Effects, procedural mesh and texture updates,Jobs,Free Threads,High Priority Thread allocation,Main Game Loop Physics, Ai, Rendering Etc.,Job Task,Job QueueJob Job Job,Synchronous Deterministic Critical Path Latency critical,Asynchronous Non-deterministic Low Priority Latency tol
9、erant,Full Auto X360 Thread Allocation,Rendering,Main Thread Physics Game Logic Animation AI Network etc,Audio,Worker threads,Thread 0 | Thread 1 _Thread 2 | Thread 3 _Thread 4 | Thread 5,Voice Encode/decode,Full Auto Pipeline,Collision Detection,Collect Net data and player input,Game Logic, Animati
10、on, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CPU time,30% CPU time,Full Auto Pipeline,Collision Detection,Collect Net data and player input,Game Logic, Animation, AI,Integration and Constraint Solving,Kick render, audio, Network data,30% CPU time,40% CP
11、U time,30% CPU time,Collision Detection,All objects stored in a global kdop tree. Moving objects can collide with moving objects and idle objects idle objects cannot collide with idle objects. Main thread forks into 5 threads (soliciting the worker threads),Collision Detection,Broad Phase,Moving Obj
12、ects,Constraint Manager,Narrow Phase,Callbacks,Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Thread,Thread,Thread,Thread,Pipeline?,Broad Phase (BVH),Narrow Phase (GJK),Callbacks,Register Contacts,This would not improve latency and would require double buffering of data,Forking,Collisi
13、on Detection Tasks,Each moving object is a task Threads pop objects off of the todo list one by one using interlocked access until they are all processed. To avoid duplicate tests, moving objects are only tested vs other moving objects whose memory address is lower,Collision Detection,Object N Broad
14、 Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Callbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Race Conditions,There are multiple threads registering contacts simultaneously. To avoid race conditions we need atomic access There are many terms
15、and concepts that will suit our needs: Mutex Critical Section Semaphore Spin Lock,Mutex Rant,Most researchers assert and often prove that locks/mutexes suck I am convinced that this is true But everything else sucks more,Mutex 101,We use: atomic primitives whenever possible. (InterlockedIncrement(),
16、cellAtomicIncrement(), etc) a Mutex variant for everything else. I.e. constraintManagerMutex.Acquire() Register contacts constraintManagerMutex.Release();,Spin lock Mutex.,Acquire() looks like:while(_InterlockedCompareExchange(With slight variations you can have a recursive mutex or a read/write mut
17、ex. System mutexs will set up events and sleep. Useful if you have more software threads than hardware threads,Collision Detection,Object N Broad Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Callbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Logic Code,Mut
18、ex,Mutex,Constraint Manager,The constraint manager keeps track of all interacting objects. The connected sub graphs form simulation batches of coupled objects Objects that must be solved together. More on this later,Collision Detection Results,There were enough objects to distribute tasks evenly wit
19、h good load balancing. In the median case with 5 threads across 3 cores we had a 3-4x performance increase as compared to 1 thread on 1 core. The worst case was about 2x, this was due to rare situations in which there were a lot of call-backs that held locks for too long. We wanted to improve this,C
20、ollision Detection Callbacks,We have many logic call-backs that trigger in response to contacts. Due to the quantity of logic code that was not thread safe, we added a lock around all call-backs that were suspicious. This way they could think that that are single threaded. They know that no other th
21、read agnostic logic is running while they are running. Call-backs are not allowed to access global collision data, they use information that is passed to them regarding the specific contact.,Collision Detection,Object N Broad Phase,Moving Objects,Constraint Manager,Object N Narrow Phase,Object N Cal
22、lbacks,Object N Register Contacts,KDOP Tree All Objects Read Only,Less thread friendly logic code,Mutex,Mutex,Thread friendly logic code,Collision Detection Callbacks,I considered adding extra software threads to help reduce the impact of waiting on locks. I.e.: Each thread runs as usual, but when i
23、t needs a lock that is taken, it pushes its current context and starts from the beginning with the next available object. When the lock is available, it goes back to its first object and finishes that before continuing with the second object.,Collision Detection Callbacks,It is generally important t
24、hat you test all hypothesis. Bonus points if you talk about the results at GDC to save other game programmers time. Even if you dont want points you might get a free pass to GDC if your lecture is accepted. That being said, it felt messy and I didnt want a lot of software threads hanging around, so
25、I didnt try it.,Collision Detection Callbacks,In the future we plan to make all call-backs deferred, with no assumptions as to ordering. This will be needed for PS3, where collision detection will not occur on a general purpose CPU. I anticipate a lot of debugging and pain.,PPU,PPU,SPU,Collision Det
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THREADINGFULLAUTOPHYSICSPPT
