# Using TB-Net Whitebox Recommendation Model [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/xai/docs/source_en/using_tbnet.md) ## What is TB-Net TB-Net is a white box recommendation model, which constructs subgraphs in knowledge graphs based on the interaction between users and items as well as the features of items, and then calculates paths in the graphs using a bidirectional conduction algorithm. Finally, we can obtain explainable recommendation results. Paper: Shendi Wang, Haoyang Li, Caleb Chen Cao, Xiao-Hui Li, Ng Ngai Fai, Jianxin Liu, Xun Xue, Hu Song, Jinyu Li, Guangye Gu, Lei Chen. [Tower Bridge Net (TB-Net): Bidirectional Knowledge Graph Aware Embedding Propagation for Explainable Recommender Systems](https://ieeexplore.ieee.org/document/9835387) ## Preparations ### Downloading Data Package First of all, we have to download the data package and put it underneath the `models/whitebox/tbnet` directory of a local XAI [source package](https://gitee.com/mindspore/xai): ```bash wget https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/xai/tbnet_data.tar.gz tar -xf tbnet_data.tar.gz git clone https://gitee.com/mindspore/xai.git mv data xai/models/whitebox/tbnet ``` `xai/models/whitebox/tbnet/` files: ```bash . └─tbnet ├─README.md ├─README_CN.md ├─data │ └─steam # Steam user purchase history dataset │ ├─LICENSE │ ├─config.json # hyper-parameters and training configuration │ ├─src_infer.csv # source datafile for inference │ ├─src_test.csv # source datafile for evaluation │ └─src_train.csv # source datafile for training ├─src │ ├─dataset.py # dataset loader │ ├─embedding.py # embedding module │ ├─metrics.py # model metrics │ ├─path_gen.py # data preprocessor │ ├─recommend.py # result aggregator │ └─tbnet.py # TB-Net architecture ├─export.py # export MINDIR/AIR script ├─preprocess.py # data pre-processing script ├─eval.py # evaluation script ├─infer.py # inference and explaining script ├─train.py # training script └─tbnet_config.py # configuration reader ``` ### Preparing Python Environment TB-Net is part of the XAI package, no extra installation is required besides [MindSpore](https://mindspore.cn/install/en) and [XAI](https://www.mindspore.cn/xai/docs/en/master/installation.html). GPUs are supported. ## Data Pre-processing The complete example code of this step is [preprocess.py](https://gitee.com/mindspore/xai/blob/master/models/whitebox/tbnet/preprocess.py). Before training the TB-Net, we have to convert the source datafile to relation path data. ### Source Datafile Format The source datafiles of the steam dataset all share the exact same CSV format with headers: `user,item,rating,developer,genre,category,release_year` The first 3 columns must be present with specific order and meaning: - `user`: String, user ID, records of the same user must be grouped in consecutive rows in a single file. Splitting the records across different files will give misleading results. - `item`: String, item ID. - `rating`: Character, either `c`(user had interactions (e.g. clicked) with the item but not purchased), `p`(user purchased the item) or `x`(other items). (Remark: There is no `c` rating item in the steam dataset.) Since the order and meaning of these columns are fixed, the names do not matter, users may choose other names like `uid,iid,act`, etc. The later columns `developer,genre,category,release_year` are for the item's string attribute IDs. Users should decide the column names (i.e. relation names) and keep them consistent in all source datafiles. There must be at least one attribute column with no maximum limit. In some cases, there are more than one values in each attribute, they should be separated by `;`. Leaving the attribute blank means the item has no such attribute. The content of source datafiles for different purposes are slightly different: - `src_train.csv`: For training, the numbers of rows of `p` rating and `c` + `x` rating items should be roughly the same by re-sampling, there is no need to list all items in every user. - `src_test.csv`: For evaluation, very similar to `src_train.csv` but with less amount of data. - `src_infer.csv`: For inference, must contain data of ONLY ONE user. ALL `c`, `p` and `x` rating items should be listed. In [preprocess.py](https://gitee.com/mindspore/xai/blob/master/models/whitebox/tbnet/preprocess.py), only the `c` and `x` items are put as recommendation candidates in path data. ### Converting to Relation Path Data ```python import io import json from src.path_gen import PathGen path_gen = PathGen(per_item_paths=39) path_gen.generate("./data/steam/src_train.csv", "./data/steam/train.csv") # save id maps for the later use by Recommender for inference with io.open("./data/steam/id_maps.json", mode="w", encoding="utf-8") as f: json.dump(path_gen.id_maps(), f, indent=4) # treat newly met items and references in src_test.csv and src_infer.csv as unseen entities # dummy internal id 0 will be assigned to them path_gen.grow_id_maps = False path_gen.generate("./data/steam/src_test.csv", "./data/steam/test.csv") # for inference, only take interacted('c') and other('x') items as candidate items, # the purchased('p') items won't be recommended. # assume there is only one user in src_infer.csv path_gen.subject_ratings = "cx" path_gen.generate("./data/steam/src_infer.csv", "./data/steam/infer.csv") ``` `PathGen` is responsible for converting source datafile into relation path data. ### Relation Path Data Format Relation path data are header-less CSV (all integer values), with columns: `,