For example there are a gafillion dot com domains with sites ranging alphabetically from aardvark wildlife preservation site to the Zeus greek god fan club. When looking at the internet from an ordered perspective then yes the internet is a diabolical mess.
We create categories to index the web so the server won’t have to bother running through d a-z domain list, sifting through sites already known not to be what is wanted. Then we arrange the category sites in alphabetical order.
A human: an internet user always looks at something from an ordered perspective. This is human intelligence. Someone wants to find a site about being able to dig into locked source codes for example than this person will do a search. What to search for? The thought will not start in alphabetical order. What order then? The order to search in websites categorized as having this content. How to have this category? Does it exist? Yes, Google indexes gabillions of websites and stores them as categories in server caches for nanosecond result provided to the searcher. Network protocol program (robot, spider whatever it’s called) is a computer auto thought (algorithm) to compare and match input keyword. Density and relevance of keyword in sentence, paragraph and its concept continuity throughout site pages determine its position in indexing and ultimately in search result. Credibility and reputation measure is link based. Robot traces a link to and from a site and each page too.
Once again concept continuity rings through. Hence these external sites and pages are treated as an extension of a single page in a site. One site can have many of these page “extensions”. This makes a site more important in a robot indexer’s eye and shall look here first for preparing the next search result. The on site content worthiness is seen in robot eye from the aforementioned credibility and reputation perspective. The robot does not care whether a genius software professor from MIT wrote the content. The robot doesn’t care whether the content is legit or not. What the robot cares about is how many UNIQUE instances the pages on the site have been viewed , how big is the site’s collection of page “extensions” (credibility factor) and the up or down changes of the number of “extensions” within a set period (reputation factor) .
Now back to the “Overlord” indexer robot. Is this only one robot? Is it a number of specific function robots? The latter is more likely when dealing with massive amounts of dynamic data. This way, one robot will be less complicated to work on e.g. maintenance etc. One robot to maintain and update the main category database, one robot to maintain and update sub category database (another few as required).
One robot to check results offered by sub category robots, one robot to decide which category and sub category to get result data, one robot to display result, one robot to trace site credibility, one robot to trace site reputation, one robot to get user input, one robot to count unique visitors on a site. One robot to decide site arrangement in main category for first access according to the 3 rules (uv,c,r) and one robot same as above but for sub category.