Blogs | 02 Dec 2021

Natural Language Processing (NLP) Process, Tasks & Techniques Part 2

#sentiment analysis #natural language processing #sentiment analysis bahasa indonesia
Natural Language Processing (NLP) Process, Tasks & Techniques Part 2

On our last blog, we have already told you how NLP tools work and why syntactic and semantic analysis need to be thoroughly involved during the whole process. We will now get into the details of the main NLP tasks and techniques in syntactic and semantic analysis.

Machines are basically incapable of deciphering the human language without the help of syntactic and semantic analysis—which tasks include breaking down the human language into something a machine can read.

Syntactic analysis represents the relationship between words on a diagram called a parse tree—or the act of parsing, in shortwhile semantic analysis identifies the meaning behind those words. Below are some of the most common tasks of both syntactic and semantic analysis.

1. Tokenization

Tokenization is basically the process of simplifying a text by breaking down words into tokens—units that are considered semantically useful. Depending on its scale, tokenization is used to split sentences within a whole text (sentence tokenization) or to split words within a sentence (word tokenization).

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tokens: “Saya” - “merasa” - “sangat” - “puas” - “dengan” - “pelayanan” - “yang” - “diberikan” - “oleh” - “hotel” - “ini”

2. Part-of-speech tagging (PoS tagging)

PoS tagging or part-of-speech tagging focuses on identifying the relationship between words in order to understand the meaning behind sentences. It determines the part of speech category of each token within a text—tagging it with the label verb, adverb, noun, pronoun, preposition, etc.

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tags: Saya [pronoun] merasa [verb] sangat [adverb] puas [adjective] dengan [preposition] pelayanan [noun] yang [preposition] diberikan [verb] oleh [preposition] hotel [noun] ini [pronoun]

3. Lemmatization and stemming

For machines to understand our complex language, there needs to be some adjustments done to the forms of the words that we originally speak or write before it gets processed. NLP tools use lemmatization to transform words back to their root forms or their lemma—the form of words as they appear in the dictionary.

Sample: “memberikan” = beri, “pencarian” = cari, “pepohonan” = pohon

On the other hand, stemming refers to trimming words into their root forms even though they are less-accurate and may not always be semantically correct—thus much preferable than lemmatization for faster results and lesser complexity.

Sample: “kebersamaan, bersama, menyamai, disamakan” = sama

4. Stopword removal

Stop words are high-frequency words that add little to no semantic value to a sentence such as which, for, to, is, at, on, etc. Removing them from the text you want to process using NLP is crucial if you want to get a noise-free result—especially when you are handling large sets of data like social media comments or customer’s feedbacks that needs to be categorized based on their topics.

Sample: “Selamat pagi. Saya mengalami kendala saat sedang melakukan pemesanan tiket.”

Stopwords: selamat, pagi, saya, mengalami, saat, sedang, melakukan

Result: kendala pemesanan tiket = main topic

5. Text classification

Text classification is probably one of the most basic NLP tasks that help machines understand unstructured data by assigning appropriate categories or tags to a text based on its content. This particular NLP task is popularly used in sentiment analysis—one of the services that Sonar has.


Pelayanan CS di sini buruk sekali!” = negative

Kecepatan internetnya sepertinya baik-baik saja, sih.” = neutral

Saya sangat menyukai parfum ini.” = positive

With NLP as a core, Sonar can perform a more comprehensive and accurate sentiment analysis in Bahasa Indonesia with up to 83% accuracy—providing you with actionable insights that can help your company detect upcoming crisis and make data-driven decisions.

Contact us for a personalized demo.

Related Resources

Deep dive analysis on industry trending topics, social media trends, digital movers and shakersand other need-to-know developments on social

Sonar Sonar analytics Sonar Influence Election Day Indonesian Election Social Media Analytics Pilpres 2019

Social Analysis Report: Election Day 2019

27 Sep 2019
Sonar Sonar analytics Ratna Sarumpaet Social Media Analytics

Ratna Sarumpaet: From Victim to Perpetrator

27 Sep 2019
Sonar Sonar analytics Sonar Influence Banking Industry Indonesia Banking Industry Banking Report Socisl Media Analytics

Indonesia's Banking Industry: Quarter 1, 2017

27 Sep 2019
Sonar Sonar analytics Banking Banking Industry Banking Report Social Media Analytics

Indonesian Banking Landscape: 2017 Report

26 Sep 2019
social media monitoring social media listening big data AI Social media week jakarta

Optimizing Business Strategies With Data-Driven & AI Based Insights

14 Nov 2019
social media monitoring sentiment analysis bahasa Indonesia election day political landscape

Political Landscape Analysis: Road to 2024

30 Nov 2021

PT Sonar Analitika Indonesia is a Dataxet Pte. Ltd. company.