On Information and Interactions on the Web
Résumé
In the past few years, most spectacular advances in research have been made possible by the Web, to a large extent. The field of computer vision owes much to ImageNet, a dataset compiled with images found on the Web3. In natural language processing, large language models such as BERT or GPT are trained on large corpi found on the Web, including Wikipedia and CommonCrawl4 data. Yet, the Web holds value in itself not because it makes it easy to publish and access large dumps of data (most open data is hard to use because it is heterogeneous and lacks contextual information) but because pieces of information coming from different sources can easily be interlinked.
That makes the Web a source of information that is essentially relational. The Resource Description Framework (RDF) captures that essence by restricting its data model to triples of the form ⟨resource, relation, resource⟩, which generalizes hyperlinks. As a consequence, Web agents should be able to deal with relational data, if not RDF data.