Natural Language Processing (NLP) Process, Tasks & Techniques Part 2

We will now get into the details of the main NLP tasks and techniques in syntactic and semantic analysis.

On our last blog, we have already told you how NLP tools work and why syntactic and semantic analysis need to be thoroughly involved during the whole process. We will now get into the details of the main NLP tasks and techniques in syntactic and semantic analysis.

Machines are basically incapable of deciphering the human language without the help of syntactic and semantic analysis—which tasks include breaking down the human language into something a machine can read.

Syntactic analysis represents the relationship between words on a diagram called a parse tree—or the act of parsing, in short—while semantic analysis identifies the meaning behind those words. Below are some of the most common tasks of both syntactic and semantic analysis.

1. Tokenization

Tokenization is basically the process of simplifying a text by breaking down words into tokens—units that are considered semantically useful. Depending on its scale, tokenization is used to split sentences within a whole text (sentence tokenization) or to split words within a sentence (word tokenization).

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tokens: “Saya” – “merasa” – “sangat” – “puas” – “dengan” – “pelayanan” – “yang” – “diberikan” – “oleh” – “hotel” – “ini”

2. Part-of-speech tagging (PoS tagging)

PoS tagging or part-of-speech tagging focuses on identifying the relationship between words in order to understand the meaning behind sentences. It determines the part of speech category of each token within a text—tagging it with the label verb, adverb, noun, pronoun, preposition, etc.

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tags: Saya [pronoun] merasa [verb] sangat [adverb] puas [adjective] dengan [preposition] pelayanan [noun] yang [preposition] diberikan [verb] oleh [preposition] hotel [noun] ini [pronoun]

3. Lemmatization and stemming

For machines to understand our complex language, there needs to be some adjustments done to the forms of the words that we originally speak or write before it gets processed. NLP tools use lemmatization to transform words back to their root forms or their lemma—the form of words as they appear in the dictionary.

Sample: “memberikan” = beri, “pencarian” = cari, “pepohonan” = pohon

On the other hand, stemming refers to trimming words into their root forms even though they are less-accurate and may not always be semantically correct—thus much preferable than lemmatization for faster results and lesser complexity.

Sample: “kebersamaan, bersama, menyamai, disamakan” = sama

4. Stopword removal

Stop words are high-frequency words that add little to no semantic value to a sentence such as which, for, to, is, at, on, etc. Removing them from the text you want to process using NLP is crucial if you want to get a noise-free result—especially when you are handling large sets of data like social media comments or customer’s feedbacks that needs to be categorized based on their topics.

Sample: “Selamat pagi. Saya mengalami kendala saat sedang melakukan pemesanan tiket.”

Stopwords: selamat, pagi, saya, mengalami, saat, sedang, melakukan

Result: kendala pemesanan tiket = main topic

5. Text classification

Text classification is probably one of the most basic NLP tasks that help machines understand unstructured data by assigning appropriate categories or tags to a text based on its content. This particular NLP task is popularly used in sentiment analysis—one of the services that Sonar has.

Sample:

“Pelayanan CS di sini buruk sekali!” = negative

“Kecepatan internetnya sepertinya baik-baik saja, sih.” = neutral

“Saya sangat menyukai parfum ini.” = positive

With NLP as a core, Sonar can perform a more comprehensive and accurate sentiment analysis in Bahasa Indonesia with up to 83% accuracy—providing you with actionable insights that can help your company detect upcoming crisis and make data-driven decisions.

10 Ways to Use Social Analytics for Market Research in 2025

The success of market research depends on the ability to analyze social media data effectively. Social analytics has changed the way businesses conduct market research in the digital age. This ...

Share of Voice for Social Media: A Complete Guide 2025

Share of Voice is an important metric in evaluating the success of a social media marketing strategy. Optimize your digital marketing performance effectively. This article provides a complete guide on ...

11 Best Tools for Tracking Influencer Marketing in 2025

The success of an influencer marketing campaign cannot be separated from the ability to analyze data accurately. Managing an influencer marketing campaign requires the right tracking tools to measure success ...

7 Best Social Listening Tools for UMKM and Startups in 2025

In an era of intense digital competition, a deep understanding of customer sentiment is crucial. Brand monitoring becomes easier with the help of the right social listening tools. This article ...

Speak to us

Let's talk about what media intelligence can do for you

Contact Us

Support & Downloads

Natural Language Processing (NLP) Process, Tasks & Techniques Part 2

Related Articles

Speak to us

Sitemap

Reach Out