The design process behind using Natural Language Processing (NLP) to detect similar Filipino words.

In the Philippines, using the autocorrect feature on smartphones becomes more of an annoyance rather than a utility. It’s common for people to turn autocorrect and autocomplete off altogether because of how frequently we, as a bilingual people, switch between Filipino and English. We tend to add Filipino words along with English ones whenever they make chika to their kumares and kumpares. The colloquial use of language nowadays is just filled with mixtures of both English and Tagalog, and the rules of the mixture are arbitrary. New rules and trends on how to match English and Filipino words evolve everyday; we have traditional conyotic Taglish or collegiala English, business and working Filipino/English, kalye or street English, and so much more. This makes it harder for the built-in word English prediction that most smartphones use, rendering it almost useless.

Salita AI
Salita AI was envisioned as a tool to see what words are statistically most similar to any chosen word. Word2Vec was modified and trained with corpora scraped from social media to be able to get a rough sample of Filipino and English usage today.

Since the project needs an interface that will be able to show the different information that each vector has, our goal here was to create that interface without overwhelming the user with too much information that isn’t necessary.
The project also needs to be able to demonstrate how this technology works to hopefully inspire other developers to consider using this with Filipino text and even speech research.

With the goals in mind, we noted two main components that the interface should have: the header and the results. We then jumped onto our design board to try and draft different interfaces that we might use for this project.


The first draft was a minimalistic approach to the project. The purpose of the design was to be as simple as possible without sacrificing valuable information. While being simple helps each element pop out, it ended up being vague and ambiguous due to the lack of labels such as how closely related each result was to the parent word.


For the second draft, we wanted to emphasize the results more by putting it at the top of the header, which ended up as a footer. This helped making things more comprehensive which made it easier to use than the previous iteration. Just like the previous design however it makes use of the results sorted by how likely they are related to the parent word. This design however did not make sense because the result with the highest score was at the top of the page while the parent word itself was displayed at the bottom. Back to the drawing board.

In the final design of the header, we needed to be able to inform the user the adequate information they need to be able to use Salita AI without being ambiguous. This part was pretty straight-forward which is why having a set of instructions that isn’t wordy is a must. The search bar is also part of the header and is probably the most hands-on the user will be.

For the results, we needed to display similar words based on the parent word that user gave. We wanted to emphasize the words that are nearest to the parent word, which is why we opted to arrange the results based on the likeliness the result word was related to the parent word. Another cool feature that we integrated to the design of the results is the color of each result. We based the opacity to be the same as the percentage mentioned, making a gradient-like color scheme.

It was a short design process, but iterating through proved valuable in identifying an optimal design to display ranked results.