PodPaper

Tagline
The software that turns your academic papers and pictures into an interactive podcast!
Description

The software that turns your academic papers and pictures into an interactive podcast for listening! It allows you to input a picture or PDF and read it out loud in a much more organic order than copy pasting it into google translate. It will pause at equations and images (and read out math equations!) and jump to and from different points in the text based on the context in a similar way to a human reader!

Technologies used

Python and JS mainly with some Lua here and there. We used anaconda, PyTorch, PyTTSx3, im2markup and PyTesseract.

Obstacles

Hardware constraints for training an AI. Integrating open source packages from several languages to work. Language processing LateX.

Accomplishments

Teamwork makes the dream work. Excellent team cooperation and synergy was a keystone of making a project with so many moving parts even feasible during the limited time frame. Using all our workers to their maximum was a challenge we rose to!

Learnings

Dynamic allocation of teammates to solve problems of varying difficulty was a constant challenge throughout the project. Technically, many of us learned about what we were individually working on. AI data generating and training, organic speech generation, implementation of different languages to interface with python etc.

Next steps

To truly flesh out and train the neural network and improve its characterizability for figures, subfigures, tables etc. Improving the support of the math TTS system for more complex and varied latex symbols and grammar.

One-minute video