We worked on a network representation of CERN's experts (for privacy reasons all clear names are modified).
Scientists are connected via shared research topics. We crawled the Indico database to extract those keywords related to those topics from the abstracts each expert submitted with the talk.
The network can later be used to search for experts, connections between experts and topics as well as connected topics to get an overview of the research done at CERN, to find new collaborators and inspiring topics.
Please download the "authors_with_keywords.csv" from the CERNBox link below, and go to the provided Google Colab link. Follow the recipe in the instruction on the Colab notebook, and you'll get the demo html output :)
Team members: Sarvesh, Steffen, Kyungmin, Wun Kwan.
Crawling Indico was done with the Indico API,
and the keyword extraction with the tf-idf algorithm in Python. (done before)
This data was processed during the Webfest with Pandas dataframe.
The network representation was done using pyvis framework in Python.
- Choosing a suitable visualization method (that we were able to set up during the Webfest);
- A lot of technical issues (power cut for Sarvesh in India for the full Sunday, connection problems).
- A great team spirit :)
- Creating the network between experts and keywords and see it work for the very first time.
- Pitching a project idea;
- Working in an online collaboration over so many time zones (7h);
- Find relations in data, process it and put in a graph network representation that suits the data structure well.
Continue working on the project:
- Pre-process the keywords in a better way (refining the bag of words, compare more extraction algorithms like rake, n-grams, ...),
- add a search function;
- Implement our network representation to CERNSearch, we already got in contact with an expert who will support us :)