Talk:List of Unicode characters

This is the talk page for discussing improvements to the List of Unicode characters article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1

This article was nominated for deletion. Please review the prior discussions if you are considering re-nomination:

Keep (revision kept), 26 October 2007, see discussion.
Keep (revision kept), 22 September 2007, see discussion.
Keep (revision kept), 22 April 2007, see discussion.

Text and/or other creative content from this version of Unified Canadian Aboriginal Syllabics character table was copied or moved into List of Unicode characters with this edit on 20:26, 21 December 2007. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Dingbat with this edit on 22:55, 3 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Miscellaneous Mathematical Symbols-A with this edit on 19:31, 3 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Unicode and HTML for the Hebrew alphabet with this edit on 20:59, 4 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Arabic script in Unicode with this edit on 16:22, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Syriac (Unicode block) with this edit on 18:06, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Block Elements with this edit on 18:33, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Text and/or other creative content from this version of List of Unicode characters was copied or moved into Spacing Modifier Letters with this edit on 21:25, 8 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Why is U+00A0 not in the control character section?

Its function is a control character no? — Preceding unsigned comment added by 76.81.249.42 (talk) 01:52, 9 October 2019 (UTC)[reply]

U+00A0 has a general category of Zs (Separator, space), not Cc (Other, control) per UnicodeData.txt. BTW: I've removed U+0020 from the control character section's table because it too has a Unicode general category of Zs and the text before the table correctly states there are "65 characters, including DEL but not SP". DRMcCreedy (talk) 04:13, 9 October 2019 (UTC)[reply]

Octal Entity Reference Code

Octal code is very useful & still need to be used in some programs, for example: in bash/shell programming, escape sequence, JS(javascript), perl, postscript, etc, etc. Various OS core (low-level) libraries/programs still use octal, & its especially need to be viewed for Control-Characters, Basic-Latin, etc Unicode characater ranges.
To see/obtain more octal chart/code, you may go here: https://utf8-chartable.de/unicode-utf8-table.pl?utf8=oct
More info: https://en.wikipedia.org/wiki/UTF-8#Examples ,
Wiki page on Octal needs to be updated further with a more detail on how octal numbers are actually used in different type of computer programs. Literal conversion from hex/dec to oct is not enough for all cases. But one sentence that has "\3nn", does mention the UTF-8 based octal usage, but needs elaboration. In shell terminal, 3-digits octal code can be used, for-example, we will try to show ÷ (U+00F7) and € (U+20AC) sign: this code ‟printf "Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \u \033[1mBold\033[0m.\n";”
Or this code ‟echo $'Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \033[1mBold\033[0m.';”
both will be displayed as: ‟No-Bold. ÷ . € (1) € (2) \x20AC (3) \u20AC (4) \U000020AC (5). Bold.” (in macOS-catalina(10.15.x) old bash v3.2.57 shell did not support (3)(4)(5) format) . € = U+20AC = Decimal code-point 8364 = Octal code-point 20254 = UTF-8-Octal \342\202\254 = UTF-8-Hex \xE2\x82\xAC.
To convert a symbol/character into octal, you may do this¹:
printf 👍 | od -t o1
0000000 360 237 221 215 <-- Octal Unicode code-point 372115 (U+1F44D)
^ ^^ ^^ ^^. --atErik1 (talk) 13:43, 5 September 2020 (UTC)[reply]

The mysterious # column

Hi, most of the tables from Basic_Latin through Cyrillic have a rightmost column headed #. What is the significance? Without an explanation the naive reader is left to guess. =8~/ Thx, ... PeterEasthope (talk) 02:59, 18 November 2022 (UTC)[reply]

It's the decimal value for the hexidecimal Unicode code point. I agree it should definitely be labeled better. DRMcCreedy (talk) 03:26, 18 November 2022 (UTC)[reply]

No, it isn't. The numbers start with "001" at the space, and increment through Latin Extended-A. Then select characters in Latin Extended-B and Additional, IPA Extensions, Spacing Modifier Letters, then take up again in Greek and Coptic and Cyrillic. I have sheparded a script through the Unicode / ISO 10646 process, and I am confident I've never seen those values before. Van Isaac, GHTV^cont_WpWS 04:47, 18 November 2022 (UTC)[reply]

Sorry, I was looking at the wrong column. My best guess is it's some enumeration of the characters in WGL-4, MES-1 and MES-2. Maybe just MES-2 since the article says MES-2 contains all the characters in WGL-4 and MES-1. The WGL-4, MES-1 and MES-2 table splits the Unicode code point up by "row" and "cells" but you can see it going from U+0020–7E, 00A0–FF, 0100-017F, 018F, 0192, 01B7, etc, which matches the # column. No idea why this as added to the List of Unicode characters article. Although the lede says "This article includes the 1062 characters in the Multilingual European Character Set 2 (MES-2) subset, and some additional related characters." DRMcCreedy (talk) 08:24, 18 November 2022 (UTC)[reply]

I noticed that the change is made by @Wbm1058:. Perhaps it would be best to ask him about the rationale behind it? Smbat.petrosyan (talk) 14:01, 11 March 2025 (UTC)[reply]

Been a long time since I spent any significant time working on this page. Note that I expanded the lead section on 15 August 2016 to explain this, and apparently since then, someone decided that this was too much information, and shortened the lead to remove my more detailed explanation. Perhaps this longer explanation can be put back. The column was just my way of counting the MES-2 characters to make sure that they were all accounted for in this list. I guess I got up to 0926 before I ran out of steam and moved on to work on other things. 0927–1062 would still be in the bottom tables which haven't been converted to lists which include a Description column yet. Note the column heading MES-2 Rationale starting at List of Unicode characters#Latin Extended-B where MES-2 starts being selective, and doesn't include everything. – wbm1058 (talk) 14:58, 11 March 2025 (UTC)[reply]

This 29 December 2022 edit was a misguided move of my text as a "self-reference in the opening to a proper hatnote." – wbm1058 (talk) 15:10, 11 March 2025 (UTC)[reply]

And then this 10 September 2023 edit removed the misguided hatnote. – wbm1058 (talk) 15:18, 11 March 2025 (UTC)[reply]

The really problem is the rejected/boxed ones.

they are just boxes! No significance. 2804:663C:2D07:97C0:B103:6474:A7EA:4A7F (talk) 20:40, 7 April 2025 (UTC)[reply]

Many Unicode characters will no doubt show as boxes unless you have supporting fonts installed on your device. See Help:Multilingual support for more information. DRMcCreedy (talk) 00:00, 8 April 2025 (UTC)[reply]