SMS Character Set Handling and Multipart Messages
This guide provides essential information about SMS message length and character set handling, crucial for managing costs and configuring your system effectively.
SMS is widely used for sending text messages between mobile phones. The message length is subject to limitations based on the character set used:
- English characters: Maximum message length is 160 characters.
- International characters: Maximum message length is 70 characters.
These limits are defined by the character set used in transmitting the message.
SMS Segmentation and Reassembly (SAR)
To support longer text messages, SMS technology includes multipart SMS, involving segmentation and reassembly.
-
English Characters:
- Messages exceeding 160 characters are segmented.
- Segments are transmitted as multiple SMS messages through the GSM network.
- The recipient’s mobile phone reassembles and displays the long text as a single message.
-
International Characters:
- Segmentation starts when the message exceeds 70 characters.
Cost Calculation:
- The cost is based on the number of SMS messages used.
- Example: A 240-character English message fits into two SMS messages, costing twice as much as a single 160-character message.
Reduced Message Length for Multipart Messages
Multipart SMS technology reduces the capacity of each segment to accommodate segmentation information:
-
English Characters:
- Each segment can hold only 153 characters.
- Example: A 320-character message requires three SMS messages (153 + 153 + 14 characters).
-
International Characters:
- Each segment can hold 67 characters.
Example Calculation:
- For a 320-character message:
- First 2 segments: 153 characters each.
- Third segment: 14 characters.
By understanding these principles, you can effectively manage message costs and ensure proper configuration of your SMS systems.
Terms and Definitions, SAR Technology in Detail
Character Sets and Unicode
English Character Set: The English character set refers to the 7-bit SMS alphabet (see Appendix A), which includes English characters and a few international characters for Western Europe and Greece. These characters are defined in the ETSI GSM 03.38 standard.
International Character Set: The Unicode character set can be used to send special symbols and characters of all languages, including Chinese, Arabic, Hebrew, Cyrillic, and special Eastern European characters.
In the GSM SMS system, an SMS message can contain up to 140 bytes (standard 8-bit bytes) of message data. The 7-bit SMS alphabet allows sending 160 characters in these 140 bytes (160 * 7 = 140 * 8). However, certain characters in ETSI GSM 03.38 are represented by two 7-bit characters.
Appendix A Characters:
- "^", "{", "}", "", "[", "]", "~", "" and "€".
If the message contains characters not in the GSM 7-bit set, such as Chinese or Arabic, the text must be encoded in the Unicode UCS-2 character set, where each character uses 16 bits. This reduces the message length to 70 characters (70 * 16 = 140 * 8).
Segmentation and Reassembly (SAR)
If a message exceeds 140 bytes, SAR technology allows it to be sent as multiple physical SMS messages. The receiving device reassembles these into a single message. Each segment includes a 6-byte SAR header in the User Data Header (UDH) field, reducing the available space.
Message Segmentation Examples:
- English Characters: Each segment holds up to 153 characters (140 bytes - 6 bytes for the header = 134 bytes, or 153 7-bit characters).
- Unicode Characters: Each segment holds up to 67 characters.
When a long message is sent, SAR headers ensure correct reassembly, but they reduce the available space for message content.
By understanding and applying these principles, you can optimize message delivery and manage costs effectively when using the InteractSMS system.