Some Background: N-body simulation refers to the simulation of a dynamic system of particles under the influence of physical forces such as gravity, e.g. the solar system. N-body simulations are often avoided in games like Kerbal Space Program due to their complex nature, however have been used in games where simulation is a core part of gameplay in games such as Universe Sandbox. Below is the formula for Force.

This shows that the force will increase as the masses increase and as the distance closes. Since F=ma it makes sense to not find the force and instead just skip to the acceleration. Therefore we find the acceleration using the following calculation.

My Project: My project aims to implement a number of methods which simulate an N-body system. These methods will be compared to understand which methods are best suited to real time applications such as games.
Euler Method
Euler method is the simplest method to numerically solve N-body simulation. An initial for loop iterates through each particle. Nested in this loop is another for loop that sums the accelerations as shown above in equation 1.


It is important not to update the positions of the bodies until after all accelerations have been calculated, if the positions were to updated during the same stage some accelerations would be calculated based on position at time n and some at position from time n+1. This could cause inaccurate and unstable orbits.
Verlet Method
Verlet integration can be also used to integrate the velocities and positions of particles. Equation 5 is used to calculate positions.

Velocity is then calculated using Equation 6:

Since the velocity calculation requires the position of all particles to be calculated to find a(n+1), an initial loop calculates all new positions at time , shown at line 1 and 2 of figure 6. Once the initial loop has calculated the positions, line 3 and 4 start a new loop and a nested j loop, line 5 uses the new positions in the acceleration calculation to find the new accelerations. Lines 6 to 8 adds the values of a(n)/2 * h and a(n+1) /2 * h to v(n) and updates the acceleration.

4th order Runge-Kutta
The 4th order Runge Kutta method is a bit more complicated and considers the position and acceleration of particles at 4 timesteps. The RK4 method is defined in equation 7.

The equations of motion are applied to equation 7. Since the calculation for the acceleration requires the position to be calculated and the calculation for position requires the velocity which is calculated using acceleration, the k value for position and velocity must be calculated before the next k values can be calculated. Therefore 4 nested loops are required to sum the k accelerations and find the k position values.



The first K values calculated in lines 1 to 8 of figure 7 are the same as those found by the Euler method, since this is at time t, therefore the position and velocity are calculated using the same values and same structure. For k2 at line 15, the positions passed to the CalculateAcceleration function are modified by adding the k1/2 values, this must be done for both the i and j particles otherwise this would be calculating the forces based on positions from different times. The k3 calculations at line 24 follow the same structure, using the values of k2 instead of k1. Similarly, at line 33 the values of k4 are calculated but the k3 value is not divided by 2 as stipulated in equation 7. With all values of k calculated, a final loop is used to find the position and velocity of each particle in lines 38 to 42.
Barnes-Hut Method
So far each method have been what is known as direct methods. The Barnes-Hut method is known as an approximation method since it reduces the calculations and approximates calculations.
The first stage of the Barnes-Hut method recursively splits the system of particles into an octree. This tree is then used to determine if the particles are suitably far away to be approximated. For each particle the algorithm recursively goes down the tree starting at the root.
For smaller simulations the Barnes-Hut can cause the simulation to slow due to the time required to build the tree but for larger simulations the Barnes-Hut significantly increases the performance by reducing the number of calculations.


The diagrams above show a 2D system stored in a quad tree since it is easier to visualise the system in 2D. In actuality the system will be stored in 3D and therefore each leaf of the tree will be a 3D octant in an octree.
GPU Parallelisation
To further increase performance, CUDA, which allows GPU parallelisation , is used to improve the Euler method. CUDA makes use of kernals which are functions executed on the GPU device.
The GPU Euler method is very similar to the CPU implementation of Euler, with a few key changes.

While in theory each calculation could be done in parallel, this would require lots of memory overhead. Instead to make use of memory reuse and reduce the amount of transferring data from CPU to GPU memory, each thread will be assigned a particle, with each thread running in parallel and then in serial calculations will be executed.
The other key difference is the call to the function cudaDeviceSynchronize(), which makes the program wait for all threads to complete before continuing the simulation. This is important as some threads will take longer than others, and continuing without updating all particles would cause inaccuracies.
Comparing the methods.
We are interested in two things: performance and accuracy. Performance is easy, we simply time each step taken by each method and compare. Accuracy is considered by recording change in energy of the system. Due to the law of conservation of energy, we know that the total energy in the system should stay constant, however due to inaccuracies this will never fully be true, but we can get a good idea of how accurate each method is based on the change in energy.

Results

Above is the run time of each method which shows the direct methods to be much slower as N increases. As expected, given its 4 nested loops, the Runge-Kutta is the slowest of the methods for high values of N. Interestingly for lower values of N, the GPU, and tree methods are slower than the direct methods. This is probably due to the considerable overhead of building and updating the tree structure and traversing the structure for the Barnes-Hut methods, and the memory overhead assosiated with CUDA parallelism.

The above graph shows that the Runge-Kutta is the most accurate, showing the trade off between performance and accuracy. Interestingly the apprximations made by the Barnes-Hut do not have result in any noticeable increase in error, even at higher values of θ.

The graph above shows that the RK4 line oscilates the closest to the line of 0J, backing up the earlier claim that the RK4 is the most accurate. This graph also shows that the Verlet method has a greater variance than the Euler metod but in general is close to the Euler method in terms of accuracy and is even more accurate at certain times.




The above graphs show the change in energy at the end of a period of 10s for a system of 1000 bodies as the time step is varied. Each graph shows that as the timestep is increased so to does the error, proving that small timesteps are required to keep error low.
Conclusion
For some projects Euler method may be sufficient, it is much easier than other methods and therefore can be a useful solution. Direct methods most likely will not be fast enough for real time applications unless they are simulating particularly small N systems. Accuracy is a major stumbling block, this project only simulated small periods of time compared to the long periods most games or applications will look to simulate at high speeds. In these simulations, what seems like small errors will add up to much larger errors and issues. The Barnes-Hut method is most likely more useful for most games, since they are faster to simulate, however θ would need to be kept low to keep error low.
N-body simulations are most likely only viable if it is to be a core part of gameplay, e.g. Universe Sandbox, as it takes a lot of processing power which games can’t afford to dedicate to simulation when quicker approximations are easier to implement. The GPU provided the best results for larger values of N. However in our simulation, the GPU was solely being used for the simulation calculations, games require the GPU for a lot of other intensive graphics tasks, therefore it may not be viable. Also developers will have to make their game available to a wide range of hardware, perhaps some without a GPU and some with lower specification of GPU, they therefore cannot rely on the GPU to perform every task.
Therefore this project has found that there is no one method that fits all purposes, instead looking at the benefits and drawbacks of each method alongside the requirements of the simulation is necessary to understand which method is best suited. The choice of method is dependent on the conditions of the system, the number of bodies and the speed of simulation required.
Below is a link to my GitHub repository containing my project, also below is my full dissertation on my investigation into methods for real time simulation of an N-body system.