Efficiency is the aspect of computer programs that relate to the way the program instructs the computer. This article is not concerned on feasibility, but is focused on understanding maximum performance and total cost of ownership.
The overarching theme of it is that software is much more inefficient than we had ever thought. Software is largely primitive and confusing. Even when we think it’s not. Sometimes especially when we think it is. Even the most fundamental building blocks of computing as we know today, such as <string>.
If you’re are a software guy and anything like us, you’ve come to realise that our ways are terrible and we understand very little about how truly efficient the underlying hardware is when instructed appropriately.
The Five Laws of Efficiency
- If it’s small data and you want to do it very fast, do it in CPU+Cache.
- If the data can’t be made smaller then pass it to GPU+GPURAM.
- If GPU+GPURAM is not available or not suitable and CPU+Cache can’t handle the size then use CPU+RAM.
- GPU+GPURAM or CPU+RAM is not available or not suitable and CPU+Cache can’t handle the size then use (CPU+RAM)+(GPU+GPURAM)
- At the moment there are no other options for efficient computing (except when the use-case is storage)
Computers & CPUs
- Computers are simpler than most computer scientist think.
- Computers are made of CPUs, ports and hardware components.
- CPU operates instructions.
- Instructions are sent to the CPU through Cache and various ports.
- Various hardware components connect with the CPU through various ports
- Computer programs instruct hardware and CPU.
- Hardware is more efficient than you think.
Components & Ports
- The Cache and the various ports come in various speeds.
- The Cache is directly built in to the processor.
- GPU is connected through PCI-Express.
- Network card is connected through the PCI-Express port.
- RAM is connected through DDR-3 port
- SSD is connected through SATA port
The difference between GPU+GPURAM and CPU+Cache is that the latter is a direct physical relationship within the same construct, while GPURAM merely sits on the same board with the GPU.
- All hardware components connect through the various ports and those ports come with various speeds.
- SSD is very slow because of the SATA port. Everything going through PCI-Express is slow. DDR-4 is slightly faster but still slow. Cache is the King.
- If a hardware component (e.g. network card) is printed on the board it has the potential of being faster than the same component not printed on the board.
- Anything going through a cable or a metal connection of any kind has to be serialised and is therefore slower.
- High performance computing takes place within the available Cache size.
- Exceeding the available Cache size will lead to significant reduction in all-around efficiency.
- Kernel will hog the cache.
- If Kernel is not hogging all of the cache, the libraries will.
- Binary integers are the language of computers.
- Binary integer based programming is more efficient than you think.
- Even pure binary integers can be further reduced to smaller size.
- Fixed memory allocation (e.g. 16-bit each representing a word) is faster than dynamic memory allocation.
- Bitwise operations run faster than other operations