Contact Us
Support & Downloads

Quisque actraqum nunc no dolor sit amet augue dolor. Lorem ipsum dolor sit amet, consyect etur adipiscing elit.

Dataxet:Sonar Website
d

Dataxet:Sonar NLP

Natural Language Processing (NLP) Process, Tasks & Techniques Part 2

We will now get into the details of the main NLP tasks and techniques in syntactic and semantic analysis.

On our last blog, we have already told you how NLP tools work and why syntactic and semantic analysis need to be thoroughly involved during the whole process. We will now get into the details of the main NLP tasks and techniques in syntactic and semantic analysis.

Machines are basically incapable of deciphering the human language without the help of syntactic and semantic analysis—which tasks include breaking down the human language into something a machine can read.

Syntactic analysis represents the relationship between words on a diagram called a parse tree—or the act of parsing, in short—while semantic analysis identifies the meaning behind those words. Below are some of the most common tasks of both syntactic and semantic analysis.

1. Tokenization

Tokenization is basically the process of simplifying a text by breaking down words into tokens—units that are considered semantically useful. Depending on its scale, tokenization is used to split sentences within a whole text (sentence tokenization) or to split words within a sentence (word tokenization).

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tokens: “Saya” – “merasa” – “sangat” – “puas” – “dengan” – “pelayanan” – “yang” – “diberikan” – “oleh” – “hotel” – “ini”

2. Part-of-speech tagging (PoS tagging)

PoS tagging or part-of-speech tagging focuses on identifying the relationship between words in order to understand the meaning behind sentences. It determines the part of speech category of each token within a text—tagging it with the label verb, adverb, noun, pronoun, preposition, etc.

Sample: “Saya merasa sangat puas dengan pelayanan yang diberikan oleh hotel ini.”

Tags: Saya [pronoun] merasa [verb] sangat [adverb] puas [adjective] dengan [preposition] pelayanan [noun] yang [preposition] diberikan [verb] oleh [preposition] hotel [noun] ini [pronoun]

3. Lemmatization and stemming

For machines to understand our complex language, there needs to be some adjustments done to the forms of the words that we originally speak or write before it gets processed. NLP tools use lemmatization to transform words back to their root forms or their lemma—the form of words as they appear in the dictionary.

Sample: “memberikan” = beri, “pencarian” = cari, “pepohonan” = pohon

On the other hand, stemming refers to trimming words into their root forms even though they are less-accurate and may not always be semantically correct—thus much preferable than lemmatization for faster results and lesser complexity.

Sample: “kebersamaan, bersama, menyamai, disamakan” = sama

4. Stopword removal

Stop words are high-frequency words that add little to no semantic value to a sentence such as which, for, to, is, at, on, etc. Removing them from the text you want to process using NLP is crucial if you want to get a noise-free result—especially when you are handling large sets of data like social media comments or customer’s feedbacks that needs to be categorized based on their topics.

Sample: “Selamat pagi. Saya mengalami kendala saat sedang melakukan pemesanan tiket.”

Stopwords: selamat, pagi, saya, mengalami, saat, sedang, melakukan

Result: kendala pemesanan tiket = main topic

5. Text classification

Text classification is probably one of the most basic NLP tasks that help machines understand unstructured data by assigning appropriate categories or tags to a text based on its content. This particular NLP task is popularly used in sentiment analysis—one of the services that Sonar has.

Sample:

“Pelayanan CS di sini buruk sekali!” = negative

“Kecepatan internetnya sepertinya baik-baik saja, sih.” = neutral

“Saya sangat menyukai parfum ini.” = positive

With NLP as a core, Sonar can perform a more comprehensive and accurate sentiment analysis in Bahasa Indonesia with up to 83% accuracy—providing you with actionable insights that can help your company detect upcoming crisis and make data-driven decisions.

Contact us for a personalized demo.

Related Articles

Speak to us

Let's talk about what media intelligence can do for you

Dataxet:Sonar Website Table