01 May 2009

Idle thoughts IV: "example"

Now that I've read (skimmed) [parts of] the articles for BitTorrent, DHT and a popular torrent implementation I can say a bit more.

Sorry as always for using 'you'. Browsing a torrent website for user-created contents, you find a file that you want to watch. There is a list of torrents or a link to a list of torrents which contain this file, with basic information such seeders, leechers, total downloads and date added. You click on one of these torrents which takes you to its download page.

Part of this page, maybe a tab, shows torrent history or which torrent collections it draws from, dates and authors and size and so on, maybe a quick summary of how many files it draws from each previous torrent and how many it adds that weren't in previous torrents. Might have a more detailed diff but that's not something everyone would care to see on the basic display especially for large torrents.

The other part of the page, other than author and date and so on, shows the files contained within the torrent. Every file that is tracked on this torrent site has its name on the site shown and a link to its page provided; it also includes the name of this file as listed in this particular torrent, as it may be different from the name on the website. It also lists the names, and maybe a hash or key for files which are not tracked on this site, as well as basic information such as their size and so on.


Due to tracker issues, it is not possible to individually track every single file in a torrent in all cases. This applies to DHT too, it seems like it would not be reasonable for every seed to publish hashes for thousands of different files, as DHT nodes have to store this information many times over for many different seeds. Even if uTorrent only shows up as 10MB memory usage in Windows, I'm sure (in the nontechnical sense) that it would be a significant burden to store DHT/tracker info for all files in a torrent, on some systems.

So the compromise in usual cases would be to track folders. Current torrent implementations haven't added a 'file layer' to their extensions because it seems an unnecessary burden on trackers and DHT nodes; not to mention the major benefits to reseeding can only be realized with integration with the file manager which there would seem to be no reason to do. But it's necessary for user-created contents, where the goal is not to mass download low-quality content in bulk such as in most torrent sites like The Pirate Bay; nor to download certain files the user is already aware of prior to looking them up, which all torrent implementations are currently capable of; nor to track a finite quantity of high-quality content, as with D-Addicts.

The goal for user-created contents is to encourage spreading, and keeping high-quality content, instead of feeling the need to collect more low-quality content so you can sort it out yourself to find what interests you. When there is no mechanism to keep and update high-quality collections of content, it encourages this 'race' mentality which not only affects the overall pattern of content sharing, it also affects attitudes towards the content itself which is especially important for socially-derived content like an MMO. (It also affects attitudes towards other types of shared files, which I won't mention. >_>)

I have no idea what any of that had to do with 'tracking folders is a compromise'. anyway!~ =p This would NOT be 'Mainline DHT', at least I don't think so; it would mean tracking keys or hashes that don't associate with specific names, and I don't know if that's how BitTorrent Mainline DHT works. Many things you can add onto a normal client as an extension, which will only work for interacting with other clients with the same functionality but will still be able to function normally in other cases.


Adding a torrent to a website: I've never created a torrent myself or read the FAQs on how to create one, but this is generally how it would work. Standardized hash boundaries and so on... this might get a little BitComet-ty in creating what look like bad torrents for other clients >_< Basically, if you were to create a torrent out of three files on your system, and someone else creates a torrent out of the same three files on their system, the hashes need to be the same. This means restricting whatever options are out there for hashing, at least if they are selected for inclusion in the semantic 'file layer' of this BitTorrent protocol extension.

So prior to adding a torrent to a website; creating a torrent. You have three files and a folder with 70 files that you want to put into a torrent.

File#1 is tracked individually, so it has standardized hash boundaries/piece size and its own, overall hash for inclusion in DHT. I don't know how this relates to Merkle Hash Tree torrents (future reading topic, maybe never...). It is a file that you created yourself and uploaded the standardized hash to the tracking website already, with a description of the file's content and credits.

File#2 is also tracked individually, but was uploaded to the website months ago and was created by you or someone else.

File#3 is a short descriptive file that you are not marking as tracked individually because it's too small to bother with DHT and not relevant to any other torrent collection. (This is the part I'm not sure about... it still needs to be included in any reseeding, so might it be best to compress it in the torrent file itself with regards to file management? Jigdo concept >_< Maybe it could have its own extension and the client could designate a single folder for these files so everything that downloads into the normal folder to be later moved, sorted and renamed as needed by the user would be 'content' like a movie that had its own place, instead of descriptive 'metadata' that just gets lost or deleted...? Sounds good.~ ) uhmm. So in other words, whether the torrents track a file is separate from whether the client, or 'torrent thumbnail' implementation-independent operating system standard, tracks the locations of files that are eligible for inclusion in torrent reseeding. Small files may still need to be tracked in the filesystem by the OS/client even if they are not included on trackers or DHT, including these unique-extension (or magic numbers) 'descriptive' files that go in their own folder.

The folder is selected to be tracked as a unit, which means it has its own DHT and tracker entry (for trackers which allow it) and can have its name changed, but the files inside of it cannot because there'd be no way to reference them in the torrent file (to map files on the seeder's filesystem to pieces that other peers request) if their names were changed. In contrast, files that are individually tracked in a torrent can be identified in the filesystem by their hash and size and other information stored in the torrent. This is appropriate for large numbers of files such as pictures, where the files are neither changed (as in the case with a program, as if anyone should be downloading executables with torrents....) nor renamed (because they're too numerous and there's no reason to do so since they're in their own folder with no namespace intersections). Folders could be combined and the files pooled, as long as there were no namespace intersections that would prevent a torrent from finding files by name. Note that trying to seed a torrent where this had taken place would only mean that certain pieces would not hash correctly so the seed would be just partially complete.


Finally you can upload to the website. This includes information about torrent history. You could start this process from an existing torrent, altho not without creating the torrent yourself to confirm that you can actually seed the torrent once you upload the torrent file; you could also start off fresh and search for known previous torrents to add to your torrent's declared history. After adding a description (POSSIBLY) it would analyze the torrent you uploaded and associate any files in the torrent with ones on the website, providing links as previously mentioned. I think that's it, the tracker would accept it and you could start seeding then.

Now if someone were to add your torrent to their client. They would ping the tracker or whatever stuff a client does when it adds a torrent. Their client would see you as a seed. It would look at the torrent file, and see that files 1 and 2, and the folder, are tracked individually. It would attempt to download file 3 from you as this torrent is the only one with that file, as well as any other similarly unique pieces of the torrent. It would ask the tracker if it knows any other torrents that contain file 1, file 2, or the folder. In this case the tracker is only allowing individual tracking of files uploaded to (might require full upload, not just hash, for authentication purposes..? maybe.) and described to the website, but other trackers and websites might be more expansive; or more restrictive if they do not support this 'file layer' bittorrent protocol extension.

The tracker only lists this torrent as containing file 1, since you only created and uploaded it today; but it does list three other torrents which contain file 2. This new peer, named Nishikado, asks one of the peers on those torrents for the torrent file via Magnet link to obtain information about where in the torrent the file is that he wants.

Right about now his client does something weird with the display, either it shows the torrents he's leeching from or it doesn't show the details by default... I don't know... it might only work for peers that support this protocol extension, and it might be best if it can only connect to such peers to prevent a perception that clients with this protocol extension are 'selfish'. . .

. . . I think that's it. Connect to peers, download file. o_O It does work a bit different depending on whether it supports all peers or just ones with this protocol as described above, but almost everything happens prior to finding a peer to download from. example done.

And then there's all the social implications of a bittorrent extension that allows the community to refine torrent collections, but that's a different story that hopefully will eventually be finished one day. . . .

Oh, and Nishikado's client also checks for the folder on DHT in case someone has it despite the tracker not viewing it as a separate file, since the torrent provides an overall hash, and standardized piece/hash sizes for the folder in case other torrents do contain it. Your own client for its part broadcasts file 1, file 2, the folder, as well as the overall torrent to nearby nodes in the hash space on DHT.


One day

No comments:

Post a Comment