
Relational databases power many of Google’s prediction services and other services that people use every day, like content recommendation and traffic prediction, and they make up the majority of enterprise data formats. The majority of non-trivial applications make use of multiple tables; in fact, some complex Google applications may necessitate maintaining hundreds of tables, making it difficult to extract actionable value from such networks of tables. Traditional tabular machine learning (ML) methods (like decision trees) often struggle to fully leverage the connectivity structure of these relational schemas.
On the other hand, recent advancements in machine learning (ML) have provided a set of tools for building graph neural networks (GNNs) that are designed to work with graph-structured data. These GNNs can be used to perform tasks that are relevant to the business, such as node classification (also known as regression) or graph-level predictions.
However, most GNNs are fixed to a particular graph on which the model has been trained and cannot generalize to novel graphs with new nodes, edge types, features, and node labels. A model trained on a 100M-node citation graph benchmark, for instance, cannot be used for your own graph (such as transactions between users and products) because the feature and label spaces are so vastly different. As a result, you will need to retrain the model on your own data. There hasn’t yet been a general model that can learn meaningful representations across relational data and tackle all node-, link-, and graph-level prediction tasks, despite some initial attempts demonstrating the concept’s viability in specific link prediction and node classification tasks. Today, we explore the possibility of designing a single model that can excel on interconnected relational tables and at the same time generalize to any arbitrary set of tables, features, and tasks without additional training. We are thrilled to share our most recent progress in creating graph foundation models (GFMs) that significantly expand the boundaries of tabular machine learning and graph learning.
Relational tables as graphs
Even when tabular feature data, such as price, size, or category, is sparse or noisy, we argue that utilizing the connectivity structure between tables is essential for efficient ML algorithms and improved downstream performance. As a result, the only step in the data preparation process is to combine a number of tables into a single heterogeneous graph. The procedure is fairly straightforward and can be carried out on a large scale: each row in a table becomes a node, and each table becomes a distinct node type. For each row in a table, its foreign key relations become typed edges to respective nodes from other tables while the rest of the columns are treated as node features (typically, with numerical or categorical values). Optionally, we can also keep temporal information as node or edge features.
Transforming relational tables into graphs for each target domain results in separate graphs with a different number of node types, edge types, node features, and node labels. The next obstacle is coming up with a single generalizable machine learning model that can be trained on a single graph (a set of tables) and infer from any unseen graph despite the fact that the structures and schemas are different.
Graph foundation models
Using a high-capacity neural network, like a Transformer, trained on a large amount of diverse data is a common method for creating foundation models. The absence of a common tokenization mechanism for graphs is a unique issue with GFMs. In contrast, when applying a Transformer to language and vision models, every possible string can be represented via tokens from a prepared vocabulary or images and videos can be encoded via image patches, respectively.
This necessitates transferable strategies for handling node features and encoding arbitrary database schemas, regardless of the number of nodes (classes) and edge types that lie between them. This includes creating a fixed-size node representation using, for instance, thirty categorical features or three continuous float features. We cannot rely on hard-coded embedding tables of node types because we want a single model that can generalize to any tables and node types, such as training on citation graphs and running inference on product graphs. Similar to what we want for node features, we want a model that can be trained on arbitrary floats and categorical features like “price” and “size” as well as features like “length” and “season.” Our main finding is that models trained on “absolute” dataset features, such as embedding tables or projections hard-coded to a particular feature distribution, do not generalize, whereas capturing how features interact with one another across a variety of tasks improves generalization.