Mac OS X SMB Peformance Tuning for Gigabit Speed

Posted on Posted in Mac

The original post was taken from here:

OS X SMB Performance Tuning for Gigabit Speed

 

There is a decent amount of documentation out there that details all of the tunable parameters on the Mac OSX IP stack. However, most of these documents either provide basic suggestions without much background on a particular setting or they discuss some of the implications of changing certain parameters but don’t give you very solid guidance or recommendations on the best configuration in a particular scenario. Many of the parameters are dependent upon others. So, the configuration should be addressed with that in mind. This document applies to OSX 10.5 Leopard, 10.6 Snow Leopard, and 10.7 Lion.

The closest thing I have found to a full discussion and tutorial on the topic can be found here. Unfortunately, that link is now offline. For now, you can at least find it in the Wayback Machine archive here. Thank you to martineau(at)linxure.net for pointing this out.

The above document is a great reference to bookmark. I thought I would also include my own thoughts on this topic to shed some additional light on the subject.

UPDATED

This post has generated a fair amount of feedback, due to issues encountered. I have attempted to provide an update to address certain strange connectivity behaviors. I believe that most issues were caused by the aggressive TCP keepalive timers. I have not been able to recreate any of the strange side effects, yet, with these updates. So far, so good. After nearly 5 months of using these settings, I can report that all local applications and systems are running well. I definitely notice more zip in webpage display inside Chrome and I am able to sustain higher throughputs on various speed tests compared to before.

For reference, here are the custom settings I have added to my own sysctl.conf file:


kern.ipc.maxsockbuf=4194304
kern.ipc.somaxconn=512
kern.ipc.maxsockets=2048
kern.ipc.nmbclusters=2048
net.inet.tcp.rfc1323=1
net.inet.tcp.win_scale_factor=3
net.inet.tcp.sockthreshold=16
net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.mssdflt=1440
net.inet.tcp.msl=15000
net.inet.tcp.always_keepalive=0
net.inet.tcp.delayed_ack=0
net.inet.tcp.slowstart_flightsize=4
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.icmp.icmplim=50

The easiest way to edit this file is to open a Terminal window and execute ‘sudo nano /etc/sysctl.conf’. The sudo command allows you to elevate your rights to admin. You will be prompted to enter your password if you have admin rights. nano is the name of the command line text editor program. The above entries just get added to this file one line at a time.

You can also update your running settings without rebooting by using the ‘sudo sysctl -w’ command. Just append each of the above settings one at a time after this command. kern.ipc.maxsockets and kern.ipc.nmbclusters can only be modified from the sysctl.conf file upon reboot.

Following you will find my explanations about each of the parameters I have customized or included in my sysctl.conf file:

  1. One suggestion out there is to set the kern.ipc.maxsockbuf value to the sum of the net.inet.tcp.sendspace and net.inet.tcp.recvspace variables. The key is that this value can’t be any less than the sum of those 2 values or it can cause some fatal errors with your system. The default value, at least what I found on 10.6.5, is 4194304. That is more than enough. In my case, I have just hard coded this value into my sysctl.conf file to ensure that it does not change to prevent problems. If you are trying to tune for high throughput with Gigabit connectivity you may want to increase this value as recommended in the TCP Tuning Guide for FreeBSD. Generally the suggestion seems to be to minimally set this value to twice the Bandwidth Delay Product. For the majority of the world that is using their Mac on a DSL or Cable connection, that value would be much less than what you would need to support local transfers on your LAN. Personally, I’d leave this at the default but hard code it just to be sure.
  2. kern.ipc.somaxconn limits the maximum number of sockets that can be open at any one time. The default here is just 128. If an attacker can flood you with a sufficiently high number of SYN packets in a short enough period of time, all of your possible network connections will be used up, thus successfully denying your users access to the service. Increasing this value is also beneficial if you run any automated programs like P2P clients that can drain your connection pool real quickly.
  3. kern.ipc.maxsockets and kern.ipc.nmbclusters set the connection thresholds for the entire system. One socket is created per network connection, and one per unix domain socket connection. While remote servers and clients will connect to you on the network, more and more local applications are taking advantage of using unix domain sockets for inter-process communication. There is far less overhead as full TCP packets don’t have to be constructed. The speed of unix domain socket communication is also much faster as data does not have to go over the network stack but can instead go almost directly to the application. The number of sockets you’ll need depends on what applications will be running. I would recommend starting with a value matching the number of network buffers, and then tuning it as appropriate. You can find out how many network buffer clusters in use with the command netstat -m. The defaults are usually fine for most people. However, if you want to host Torrents, you will likely want to tune these values to 2 or 4 times the default of 512. The kern.ipc.nmbclustersvalue appears to default to 32768 on 10.7 Lion. So, this should not be something you have to tune going forward.
  4. I have hard-coded the enabling of RFC1323 (net.inet.tcp.rfc1323) which is the TCP High Performance Options (Window Scaling). This should be on by default on all future OSX builds. I have also hard-coded the Window Scaling factor to a value of 3. This is also a default now. Unfortunately the Internet moves slowly when it comes to a significant change to a fundamental protocol behavior. Setting this value to 3 is a balance between performance and causing your connection performance to suck. If you notice unacceptable poor performance with key applications you use, I would suggest you disable this option and make sure your net.inet.tcp.sendspaceand net.inet.tcp.recvspace values are set no higher than 65535. Any applications with load balanced servers that are using a Layer 5 type ruleset can exhibit performance problems with window scaling if the explicit window scaling configuration has not been properly addressed on the Load Balancer.
  5. The next option I have set is net.inet.tcp.sockthreshold. This parameter sets the number of open sockets at which the system will begin obeying the values you set in the net.inet.tcp.recvspace and net.inet.tcp.sendspaceparameters. The default value is 64. Essentially think of this as a Quality of Service threshold. Prior to reaching this number of simultaneous sockets, the system will restrict itself to a max window of 65536 bytes per connection. As long as your window sizes are set above 65536, once you hit the socket threshold, the performance should always be better than anything to which you were previously accustomed. The higher you set this value, the more opportunity you give the system to “take over” all of your network resources.
  6. The net.inet.tcp.sendspace and net.inet.tcp.recvspace settings control the maximum TCP window size the system will allow sending from the machine or receiving to the machine when the. Up until the latest releases of most operating systems, these values defaulted to 65535 bytes. This has been the de facto standard essentially from the beginning of  the TCP protocol’s existence in the 1970′s. Now that the RFC1323 High Performance TCP Options are starting to be more widely accepted and configured, these values can be increased to improve performance. I have set mine both to 262144 bytes. That is essentially 4 times the previous limit. You must have the RFC1323 options enabled, in order to set these values above 65535.
  7. The net.inet.tcp.mssdflt setting seems simple to configure on the surface. However, arriving at the optimum setting for your particular network setup and requirements can be a mathematical exercise that is not straightforward. The default MSS value that Apple has configured is a measly 512 bytes. This is not really noticeable on a high speed LAN segment. But it can be a performance bottleneck across a typical residential broadband connection. Thissetting adjusts the Maximum Segment Size that your system can transmit. You need to understand the characteristics of your own network connection, in order to determine the appropriate value. For a machine that only communicates with other hosts across a normal Ethernet network, the answer is very simple. The value should be set to 1500 bytes, as this is the standard MTU and MSS on Ethernet networks. In my case, I have a DSL line that uses PPPoE for its transport protocol. In order to get the most out of my DSL line and avoid wasteful protocol overhead, I want this value to be exactly equal to the amount of payload data I can attach within a single PPPoE frame to avoid fragmenting segments which causes additional PPPoE frames and ATM Cells to be created which adds to the overall overhead on my DSL line and reduces my effective bandwidth. There are quite a few references out there to help you determine the appropriate setting.So, to configure for  a DSL line that uses PPPoE like mine, an approriate MSS value would be 1452 bytes. 1460 bytes is the normal MSS on Ethernet for IP traffic as there is normally a 40 byte IP header. With PPPoE you have to subtract an additional 8 bytes of overhead for the PPPoE header. That leaves you with an MSS of 1452 bytes. There is one other element to account for. ATM. Many DSL providers, like mine, use the ATM protocol as the underlying transport carrier for your PPPoE data. That used to be the only way it was done. ATM uses 53 byte cells of which each cell has a 5 byte header. That leaves 48 bytes for payload in each cell. If I set my MSS to 1452 bytes, that does not divide evenly across ATM’s 48 byte cell payloads. 1452/48 = 30.25  I am left with 12 bytes of additional data to send at the end. Ultimately ATM will fill the last cell with 36 bytes of null data in that scenario. To avoid this overhead, I reduce the MSS to 1440 bytes so that it will evenly fit into the ATM cells. 30 * 48 = 1440 < 1452
  8. net.inet.tcp.msl defines the Maximum Segment Life. This is the maximum amount of time to wait for an ACK in reply to a SYN-ACK or FIN-ACK, in milliseconds. If an ACK is not received in this time, the segment can be considered “lost” and the network connection is freed. This setting is primarily about DoS protection. There are two implications for this. When you are trying to close a connection, if the final ACK is lost or delayed, the socket will still close, and more quickly. However if a client is trying to open a connection to you and their ACK is delayed more than 7500ms, the connection will not form. RFC 753 defines the MSL as 120 seconds (120000ms), however this was written in 1979 and timing issues have changed slightly since then. Today, FreeBSD’s default is 30000ms. This is sufficient for most conditions, but for stronger DoS protection you will want to lower this. I have set mine to 15000.
  9. It appears that the aggressive TCP keepalive timers below are not well liked by quite a few built-in applications. I have removed these adjustments and kept the defaults for the time being. net.inet.tcp.keepidle sets the interval in milliseconds when the system will send a keepalive packet to test an idle connection to see if it is still active. I set this to 120000 or 120 seconds which is a fairly common interval. The default is 2 hours.
  10. net.inet.tcp.keepinit sets the keepalive probe interval in milliseconds during initiation of a TCP connection. I have set mine to the same as the regular interval which is 1500 or 1.5 seconds
  11. net.inet.tcp.keepintvl sets the interval, in milliseconds, between keepalive probes sent to remote machines. After TCPTV_KEEPCNT (default 8) probes are sent, with no response, the (TCP) connection is dropped. I have set this value to 1500 or 1.5 seconds.
  12. net.inet.tcp.delayed_ack controls the behavior when sending TCP acknowledgements. Allowing delayed ACKs can slow some systems down. There is also some known poor interaction with the Nagle algorithm in the TCP stack when dealing with slow start and congestion control. I have disabled this feature.
  13. net.inet.tcp.slowstart_flightsize sets the number of outstanding packets permitted with non-local systems during the slowstart phase of TCP ramp up. In order to more quickly overcome TCP slowstart, I have bumped this up to a value of 4.
  14. net.inet.tcp.blackhole defines what happens when a TCP packet is received on a closed port. When set to ’1′, SYN packets arriving on a closed port will be dropped without a RST packet being sent back. When set to ’2′, all packets arriving on a closed port are dropped without an RST being sent back. This saves both CPU time because packets don’t need to be processed as much, and outbound bandwidth as packets are not sent out.
  15. net.inet.udp.blackhole is similar to net.inet.tcp.blackhole in its function. As the UDP protocol does not have states like TCP, there is only a need for one choice when it comes to dropping UDP packets. When net.inet.udp.blackhole is set to ’1′, all UDP packets arriving on a closed port will be dropped.
  16. The name ‘net.inet.icmp.icmplim‘ is somewhat misleading. This sysctl controls the maximum number of ICMP “Unreachable” and also TCP RST packets that will be sent back every second. It helps curb the effects of attacks which generate a lot of reply packets. I have set mine to a value of 50.