According to you, the issue somewhere in OS (I/O, memory allocation, other) and not in the network. 20Gbps on ConnectX-2 will give you maximum theoretical 16 Gbps because of 8/10 encoding, so 15.6 Gbps is pretty close.
I would suggest to use perf to analyze ssh/rsync behaviour, or maybe 'strace -ttt -T' option in order to see how much time it spends in the system calls