New model provides smishing protection in Swahili

Giordana Verrengia

Sep 15, 2022

By now, “phishing” is likely a familiar concept for people with an online presence, consisting of unsolicited emails that might ask you to provide sensitive information or prompt you to click a suspicious link, putting your privacy at risk. Its counterpart, smishing (SMS phishing), also deserves attention because it brings these scams to mobile devices by sending text messages that impersonate agencies like banks in a bid to get your personal data.

Smishing is a particular cybersecurity concern in Africa, the continent that does the most mobile banking in the world. There are many opportunities in a given day for an individual to lose money because transferring funds electronically is so commonplace. Within Sub-Saharan Africa, smishing is concentrated especially in Tanzania and Kenya, where the virtual banking service M-Pesa has a strong presence. The program allows users to send money with cell phones, and part of the transaction process involves both sender and recipient getting a confirmation notice via SMS.

“The traffic, per day, is insane,” CMU-Africa Assistant Teaching Professor Jema Ndibwile says of the mobile money transfers happening in those two countries. “The higher the prevalence, the higher the risk, and the more people who are vulnerable to smishing.”

Despite projections that mobile money transactions in Sub-Saharan Africa will top $3 billion by the end of 2022, there are few robust resources available in Swahili that would monitor for smishing activity. Several models to prevent attacks exist in high-resource languages like English and Chinese, but not in the most common language of the most active mobile banking region. Ndibwile and his fellow researchers, Iddi S. Mambina and Kisangiri F. Michael, both of the Nelson Mandela Institute of Science and Technology, created a machine-learning based hybrid model that classifies Swahili smishing text messages targeting mobile money users. Their research was recently published in IEEE Access. The algorithm’s purpose is to determine which messages targeting mobile money users are legitimate and which are smishing, and it boasts an accuracy rating of over 99 percent. The model was tested using over 30,000 SMS samples provided by college students.

Smishing leaves anyone with a mobile phone vulnerable, but Ndibwile thinks it falls squarely on the shoulders of mobile network operators (MNOs) to be proactive with security measures, rather than users themselves. MNOs are in the best position to implement resources to detect and block suspicious content from circulation.

“A user should be protected,” Ndibwile says. “You don’t tell a user, ‘Hey, pay attention when SMS reads like this.’ You need to prevent those kinds of attacks.”

While Ndibwile and his peers focused on creating a machine-learning based model in Swahili, this is a foundational research advancement. The hybrid model could be used as a plug-in format to create systems in other low-resource languages like Kinyarwanda. This single design has the potential to provide cybersecurity reinforcement in languages that are often forgotten.