Fundamentals: Sockets
What is a socket?
A socket is a tool that allows communication between two different processes on the same or different machines. To be more precise, it’s a way to talk to other computers using standard Unix file descriptors. In Unix, every I/O action is done by writing or reading a file descriptor. A file descriptor is just an integer associated with an open file and it can be a network connection, a text file, a terminal, or something else.
You hear talk of "sockets" all the time, and perhaps you are wondering just what they are exactly. Well, they're this: a way to speak to other programs using standard Unix file descriptors.
What?
Ok—you may have heard some Unix hacker state, "Jeez, everything in Unix is a file!" What that person may have been talking about is the fact that when Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor. A file descriptor is simply an integer associated with an open file. But (and here's the catch), that file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just about anything else. Everything in Unix is a file! So when you want to communicate with another program over the Internet you're gonna do it through a file descriptor, you'd better believe it.
"Where do I get this file descriptor for network communication, Mr. Smarty-Pants?" is probably the last question on your mind right now, but I'm going to answer it anyway: You make a call to the socket() system routine. It returns the socket descriptor, and you communicate through it using the specialized send() and recv() (man send, man recv) socket calls.
"But, hey!" you might be exclaiming right about now. "If it's a file descriptor, why in the name of Neptune can't I just use the normal read() and write() calls to communicate through the socket?" The short answer is, "You can!" The longer answer is, "You can, but send() and recv() offer much greater control over your data transmission."
What next? How about this: there are all kinds of sockets. There are DARPA Internet addresses (Internet Sockets), path names on a local node (Unix Sockets), CCITT X.25 addresses (X.25 Sockets that you can safely ignore), and probably many others depending on which Unix flavor you run. This document deals only with the first: Internet Sockets.
I really like Beej’s writing style.
Reference: Beej’s Guide to Network Programming
From here, it can be seen that when talking about sockets it is important to be clear on the context. What kind of socket are we talking about? For this post, we will continue with Intenet Sockets. I was in the midst of messing with NodeJS when I just had to get sockets right.
Internet sockets
Programmatically, it can be said that an internet socket consists of 5 things (see tuple):
{protocol, local address, local port, remote address, remote port}
There are 2 types: - Stream sockets (SOCK_STREAM): reliable two-way connected communication streams - Datagram sockets (SOCK_DGRAM): connectionless and data may or may not arrive
Stream sockets achieve this reliability by using a protocol: Transmission Control Protocol (TCP; RFC 793). Usually you hear of TCP/IP. IP is Internet Protocol (RFC 791) which is responsible for routing not data integrity.
Datagram sockets use a protocol called User Datagram Protocol (UDP; RFC 768). For Datagram sockets you don’t have to maintain an open connection. Stream sockets you do.