Exit e-book
Show all chapters
02
Methodology
02. 
Methodology

You are viewing the old version of the ranking

To see the new, updated version, click below. If you want to proceed - just close the popup window. Enjoy!

Go to the new ranking
Top Functional Programming Languages Ranking Based On Sentiment Analysis 2021
02

Methodology

We’ve collected data on 12 languages – Kotlin, Scala, Clojure, Erlang, Swift, Elixir, Haskell, Rust, OCaml, Elm, PureScript, and Idris – from the functional programming family based on keywords, topics, hashtags, and @mentions across blogs, news sites, and social networks.

We analyzed 123084 mentions concerning these languages to find out which of the functional programming languages are loved by people, in other words, which have the most positive sentiment among all of the languages we analyzed and which have the most negative sentiment.

Specifically, we looked at the percent of positive sentiment, and negative sentiment. Using data from Brand24, we uncovered some very interesting findings, and now we want to share them with you. But first, for those who are curious about our methods and the Brand24 sentiment analysis, we prepared a short overview of the methodology.

Can’t wait to see the outcomes?

Jump straight to the ranking

https://scalac.io/ebook/functional-programming-languages-sentiment-analysis/functional-programming-languages-ranking-2021/

Not a fan of reading online?

Download PDF

 

Brand24 Sentiment Analysis

The General Idea 

The sentiment analysis mechanism is based on the most popular achievements related to artificial intelligence – i.e., deep learning networks and embedding. In short, the texts that make up the mentions are converted into numerical sequences that are much more understandable to the algorithm than pure text strings. The second neural network was based on embeddings, which we learned to distinguish between sentiment based on several hundred thousand examples. The entire text in the found entry is analyzed.

For example – thanks to such advanced technology, the tool can distinguish between “thin footballer” and “thin TV” entries. In the first case (a footballer), negative sentiment will be determined, while in the case of a thin TV set, the sentiment will be positive.

First stage.

Dictionary analysis. In their database, Brand24 stores several thousand words related to emotions – both positive and negative. Each word has an assigned weight. In the first stage, we divide the content into sentences and analyze each of them separately in terms of the presence of these words.

Stage two.

Linguistic analysis. Implementation of language rules and the context analysis mentioned above. It is well-known that the algorithm finding the word “good” should check if the word “day” has not appeared before or after it, etc. The algorithm currently contains a lot of additional parameters, such as the distance of positive or negative words from the search subject.

Only those results which the system has qualified according to these parameters are marked as positive or negative. 

What are the sources of mentions? Where does the data come from?
Functional Languages_Ranking_Sentiment Analysis Media

What Steps We Took To Rank The Programming Languages

Choosing the languages

Most of the languages that we chose feel like natural choices. However, Kotlin, Rust and Swift might not seem obvious. They are not traditionally considered functional languages, but we chose them because to some extent they support this programming style. In addition, as the process of data extraction is complex and prone to errors, we were not able to include F# and Reason in our results, but we hope to change this in future editions and updates. 

Why Is The Ranking Based On 123084 Mentions Not  963941?

As we’ve already mentioned, this report is based o 123084 mentions. This is the number of mentions we consider “qualified” for analytical purposes based on the filters we applied to filter the unwanted content – we will explain this later on in this paragraph.  963941 mentions were aggregated by the Brand24 tool in the period starting on November 1, 2020, till January 31, 2021. As you can see, not all of them were included in the analysis. 

The platform by Brand24 is based on dictionary analysis, as mentioned above. Because of this, the gathered data included not only mentions of programming languages, but all the mentions that included the keywords in the project (each language was a separate project). The keywords usually included one or two forms (if applicable) of the programming language name. For example “Scala” was only one separate keyword, but “PureScript” appeared in two forms “PureScript” and “Pure Script”. On the keyword lvl the tool differentiates these two things. 

Besides keywords, we also included “excluded keywords” where we knew straight away that the meaning of the keyword is broad. For Scala, the natural ones were for example “La Scala” and for Elm, it was “street”(A Nightmare on Elm Street) and “tree”. 

functional programming languages
functional programming languages

As we started to analyze all of the mentions, we quickly realized the amount of noise in our data was too big to solve with exclusions, so we decided to apply custom filters that enabled us to analyze only the content related to the IT industry. 

 

What is this “custom filter”?

An example of a custom filter looks like this: 

Functional Programming Languages Ranking ​​| Sentiment Analysis

As you can probably tell, the filter lets you filter the data by telling the algorithm to search for mentions that not only have the keyword included, but also at least one of the terms defined in the filter. This enabled us to get to the real “meat”. Of course, this method is not 100% perfect, but it definitely gives a statistical overview of the way each language is represented on the internet. 

How did we create the filters? 
Every filter was based on a few factors
  • General phrases connected with every programming language 
  • Specific phrases characteristic to the language – for example libraries
  • Other languages in the ranking 
  • Most popular hashtags associated with the language 

 

This was our first time creating the ranking, but we would love to continue, so if you have any suggestions when it comes to the filters or our approach – let us know. 

How did we decide the positions in the functional programming languages ranking? 
  1. The percentage of mentions with negative sentiment
  2. The percentage of mentions with positive sentiment
scala experts
PREVIOUS
Chapter
01
NEXT
Chapter
03