In this article you will learn how:
Bittorrent is a decentralized network of nodes. Each discovers relevant nodes through a p2p network called Kademlia.
Nodes share data in pieces that are small and easy to transfer. Torrent's data can then be reconstructed from these pieces.
Bittorrent uses a distributed hash table (DHT), named Kademlia, to discover relevant nodes. Nodes in Kademlia either own the desired data or know other nodes that are closer to the data. A bootstrap node, which is hardcoded, is used to start the process.
Takeaway 1: Bittorrent nodes are discovered recursively from a hardcoded bootstrap node.
The following script starts from a hardcoded bootstrap node and recursively discovers new nodes with get_peers
messages. Discovered nodes get closer at each step until a node that owns the data is found.
The script above has 1 dependency.
The DHT nodes communicate over UDP. Each UDP message contains the message type and a payload.
Takeaway 2: Bittorrent messages are sent over UDP and contain a payload and some metadata.
The get_peers
returns nodes closer to a torrent hash, find_node
returns nodes closer a node id and announce_peer
indicates that a node is downloading a torrent hash. Then ping
is sent periodically to check that a node is online.
get_peers
message is used to find nodes close to torrent hashes. The ping
message is sent periodically to show the node is alive.
The following script finds nodes containing a torrent hash by sending get_peers
recursively.
The script above has 1 dependency.
The Bittorrent nodes communicate over TCP. Each TCP message contains some metadata and a payload. The metadata are the payload length, the message id and the message payload.
Takeaway 4: Bittorrent messages are sent over TCP and contain a payload and some metadata.
First a handshake
message should be send to establish connection. Then interested
and unchoke
signals that the node wants a torrent hash and is willing to send data respectively. The bitfield
message returns information about which pieces it owns and piece
message returns the data at a specific offset.
handshake
message is used to establish connection. Then interested
, unchoke
and bitfield
messages are used to establish interest and piece
messages to transfer the torrent.
We can establish connection to a Bittorrent node by sending handshake
, interested
and unchoke
messages. Then piece
messages can be used to request data from the node.
The script above has 1 dependency.
Torrents are transferred in smaller pieces for efficiency. These pieces are in turn split into even smaller units called blocks. When all blocks constituting a piece are received the data are checked for correctness using the piece hash. To know the piece size a BitTorrent node need the torrent's metadata.
When the BitTorrent network was first introduced in 2001 it was using specialized servers, called trackers, to shared the torrent metadata. Trackers where storing the torrent's piece hashes, peers information, information about included files, the torrent size and others but not the torrent's actual data.
The BitTorrent's DHT was introduced in 2008 with BEP_0005 to eliminate centralized parties and allow the network to transfer files in a completely decentralized manner. Clients establish connection with BitTorrent nodes with a handshake
, then an extended
handshake (as in BEP_0010) and then a ut_metadata
request (as in BEP_0009) for the torrent metadata.
Clients establish a connection first with a handshake
and an interested
message. When they receive an unchoke
message they request data from other nodes with piece
messages. For better download performance clients can connect to multiple nodes in the network and request piece data in parallel. Downloaded piece are hashed and these hashes are checked for correctness against pieces' hashes from the torrent's metadata.
The script bellow connects to a Bittorrent node and request the first piece of a torrent.
The script above has 1 dependency.