Sunday, January 11, 2015

Mach Ports

Classically, Unix IO was performed on file descriptors. There are two kinds of interactions with a file descriptor - stream based interactions and packet based interactions. Stream based interactions are designed for a flow of bytes, which is the model for file IO and TCP connections.  When using these kinds of streams, you use the read()/write() family of system calls. The kernel can coalesce multiple writes() into a single read(), which makes sense - the model is a stream of bytes.

Packet-based IO is used for things like UDP socket connections. When performing this kind of IO, you use sendmsg()/recvmsg() and the semantics of reliability or ordering come from the underlying transport mechanism (meaning: if you use a UDP Internet connection, neither strict ordering nor reliable delivery are guaranteed).

Unix has the concept of pipes and sockets. Pipes are byte-oriented, while sockets can be either byte-oriented or packet-oriented, depending on how you create them (a socket can represent both TCP streams and UDP streams). The userland interface to both of these concepts is a file descriptor, which you pass to read()/write() and sendmsg()/recvmsg() for performing the required IO.

It is also possible to use these mechanisms for communication between processes on the same machine. The pipe() system call will create a new pipe with both ends available for reading and writing, and the socketpair() system call will do the same thing for sockets. Usually, when we’re interacting with another process, we want to perform some sort of RPC [1], so the packet-based model is a better conceptual fit than the stream model. (It is possible to use/create a multiplexer / demultiplexer so that you can use byte-oriented streams to pass messages, but packet-based communication doesn’t have this need.) Using a unix domain socket [2] even has the appropriate semantics for passing reliably and correctly ordered messages.

So what exactly is a packet, then? Well, it’s defined in struct msghdr. There are four conceptual pieces:
  1. The destination address. For unix domain sockets, this is ignored
  2. An array of buffers
  3. Protocol-specific control information. For unix domain sockets, this is a struct cmsghdr
  4. Flags
Everything is pretty straightforward, but it is interesting that you can specify control information in your packet. The cmsghdr structure consists of a vector of triplets which contain:
  1. The originating protocol. This should be SOL_SOCKET
  2. The protocol-specific type of control information (an int)
  3. An arbitrary buffer
For a unix domain socket, the type can be SCM_RIGHTS, which is used for sending a file descriptor over the socket. In particular, if I have a file descriptor, I can use the SCM_RIGHTS control type and specify the file descriptor. Then, when the message is delivered, the file descriptor that the reader gets will be a duplicate of the one sent. When passing the packet, the kernel will duplicate the file descriptor for the receiving process. You can also specify SCM_TIMESTAMP, SCM_CREDS, and SCM_TIMESTAMP_MONOTONIC types.

Mach Ports [3] are conceptually similar to this model. A mach port is a message-based communications mechanism. However, it allows for much finer grained control than the packet-based approach described above.

The concept of a mach port is distinct from the concept of the rights that you have to interact with the port. The rights that you have for a particular port can change over time. There are three important types of rights: send rights, receive rights and send-once rights. A send-once right is exactly what it sounds like: the right to send to a port exactly one time.

Just like with socketpair(), you can create a new port with mach_port_allocate(), passing in MACH_PORT_RIGHT_RECEIVE. Once you have the receive right for the port, you can then grant yourself a send right with mach_port_insert_right().

You actually use the same call to send and receive from a port: mach_msg(). The message itself is modeled in the mach_msg_header_t structure, which has the following conceptual pieces:
  1. A remote port. When sending, this is the destination, and when receiving, this is the sender.
  2. A local port. When sending, you specify a port here for the receiver to reply to you with. This can be MACH_PORT_NULL if you don’t want the receiver to reply.
  3. A buffer
  4. A message ID. This is application-specific; the receiver can do whatever it wants with this. Usually it is an identifier of a particular RPC call; every function gets its own ID.
  5. Flags. These flags are used for the following:
    1. Remote right. This is used so you can specify which right you want to use when sending the message
    2. Local right. This is the right that the other end should get on the reply port that the sender specifies.
    3. Possibly mark the message as “complex.”
As you can see, mach ports aren’t bidirectional, and they don’t come in pairs. A port represents one end of a communication, so if you want to simply send a port a message, you don’t need a port of your own - you only need the destination port.

A “complex” message is where things get even more interesting. This is where you can specify an array of “descriptors” after the message header. There are various different kinds of descriptors. One kind wraps a mach port which can be sent to the receiver, similar to the SCM_RIGHTS example above. Another kind specifies out-of-line data which can be sent along with the message. This out-of-line descriptor specifies if the data should be copied to the receiver, or if the pages should be simply mapped into the receiver. A third type of descriptor serves as a combination of the two previous kinds, and allows you to specify an out-of-line array of port names to send to the receiver.

Performing asynchronous IO with mach ports is possible by using port sets. You can create a port set which contains particular ports in it, and use the port set in the mach_msg() call. The call will perform the request to any of the ports in the set. You can therefore use this similarly to how I described using select() [4].

One of the big differences between file descriptors and mach ports is that ports are not inherited through a fork() the way that file descriptors are. This means that if a process wants to talk to its child processes, it must find them using a general lookup mechanism that any process can use to look up any service. This is implemented using a process’s “bootstrap port,” which is a port that all processes automatically get that they can use to talk to a system service. The system service’s job is to allow processes to register ports under a particular name, and to allow processes to find ports by a particular name. Therefore, if two ports want to talk to each other, one uses its bootstrap port to register a well-known service name, and the other provides its bootstrap port with the same well-known name and asks for a port which it can use to communicate with the service. Therefore, there is a service discovery mechanism built right in to the mach ports infrastructure.

As you can see, all these additional facilities make mach ports ideal for interprocess RPC. Being able to explicitly codify reply messages, being able to explicitly give a message an ID which corresponds to the RPC call being performed, being able to send shared out-of-line memory, being able to send ports through ports, and being able to perform dynamic service discovery all stack up to make it a great choice.

However, I want to point out that mach ports don’t allow you to do anything that you couldn’t already do with socket packets. Sending a file descriptor over a unix domain socket is possible, as described above. Out of line data is possible to send by sending the file descriptor that represents a shared memory segment as created by shm_open() or shmget(). The smaller message limits for socket packets can be worked around by breaking up packets into smaller packets (Indeed, it looks like there is even some system facility for this with the MSG_WAITALL flag). File descriptors don’t have the same access controls that ports have, but that just means that disobeying the access controls is something that you can do with file descriptors that you can’t do with ports. Dynamic service discovery is possible with a process that listens on a well-known named pipe, like how D-Bus [5] works.

There is a big difference, though, between the two approaches. The performance of mach ports seems dramatically better than the performance of unix domain sockets. On my machine, the biggest size of a packet that I could send through a unix domain socket is 2048 bytes, which is fairly small when it comes to RPC calls. Using messages of this size, my throughput using mach messages is around 2.5 times faster than using unix domain sockets. In addition, the maximum size of a message that I could send through a mach port was on the order of 50 megabytes, which means that I would need many more socket packets to send a large message than I would need mach port messages, which would only make the case for unix domain sockets worse. I do wonder, however, if there is something innately different between sending a packet and sending a mach message that makes mach fundamentally faster, or if unix domain sockets could be sped up.

[1] http://en.wikipedia.org/wiki/Remote_procedure_call
[2] http://en.wikipedia.org/wiki/Unix_domain_socket
[3] http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/
[4] http://litherum.blogspot.com/2015/01/asynchronous-io.html
[5] http://www.freedesktop.org/wiki/Software/dbus/

No comments:

Post a Comment