ÀÌ Á¦¾ÈÀÇ ÇöÀç ¹öÀüÀº À¯´ÏÄÚµå-ÆÄÀ̽ã ÅëÇÕÀÇ ¸¹Àº ¾ç»óµé·Î ÀÎÇÏ¿© ¾à°£ Á¤¸®µÇÁö ¾Ê¾ÒÀ½À» ÁÖÀÇÇϼ¼¿ä.
ÀÌ ¹®¼ÀÇ ÃֽйöÀüÀº Ç×»ó ´ÙÀ½¿¡¼ º¼ ¼ö ÀÖ½À´Ï´Ù:
http://starship.python.net/~lemburg/unicode-proposal.txt
ÀÌÀü ¹öÀüÀº ´ÙÀ½¿¡¼ ¾òÀ» ¼ö ÀÖ½À´Ï´Ù:
http://starship.python.net/~lemburg/unicode-proposal-X.X.txt
À¯´ÏÄÚµå ±¸ÇöÀº Àڽſ¡°Ô °Ç³×Áö´Â 8-ºñÆ® ¹®ÀÚ¿ÀÇ ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© °Á¦ Çüº¯È¯À» ÇØ¾ß ÇÑ´Ù°í °¡Á¤Çϰí ÀÖÀ¸¸ç, ¾Æ¹«·± ƯÁ¤ÇÑ ÄÚµåÀüȯÀÌ ÁÖ¾îÁöÁö ¾ÊÀ¸¸é ±âº»À¸·Î ÁöÁ¤µÈ ÄÚµåÀüȯÀ¸·Î ¹®ÀÚ¿À» À¯´ÏÄÚµå º¯È¯ÇØ¾ß ÇÑ´Ù°í °¡Á¤Çϰí ÀÖ½À´Ï´Ù. ÀÌ ÄÚµåÀüȯÀ» ÀÌ ÅØ½ºÆ®¿¡¼´Â ±âº» ÄÚµåÀüȯ(default encoding)À̶ó°í ºÎ¸¨´Ï´Ù.
ÀÌ ¶§¹®¿¡, À¯´ÏÄÚµåÀÇ ±¸ÇöÀº ÇϳªÀÇ Àü¿ª º¯¼ö¸¦ À¯ÁöÇϴµ¥ site.py ÆÄÀ̽㠽ÃÀÛ ½ºÅ©¸³Æ®¿¡ ¼³Á¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌÈÄÀÇ º¯°æÀº ºÒ°¡ÇÕ´Ï´Ù. ±âº» ÄÚµåÀüȯÀº µÎ°³ÀÇ sys ¸ðµâ API·Î ¼³Á¤µÇ°í ¿¶÷µÉ ¼ö ÀÖ½À´Ï´Ù:
±×·¸Áö ¾Ê°í Á¤ÀǵÇÁö ¾Ê¾Ò°Å³ª, ¼³Á¤µÇÁö ¾Ê¾Ò´Ù¸é, ±âº» ÄÚµåÀüȯÀº 'ascii'°¡ ±âº»°ªÀÌ µË´Ï´Ù. ÀÌ ÄÚµåÀüȯÀº ÆÄÀ̽ãÀÇ ½ÃÀÛ½ÃÀÇ ±âº»°ªÀ̱⵵ ÇÕ´Ï´Ù. (±×¸®°í site.py °¡ ½ÇÇàµÇ±â Àü¿¡ È¿·ÂÀ» ¹ßÈÖÇÕ´Ï´Ù).
ÁÖ¸ñÇÒ °ÍÀº ±âº»¼³Á¤µÈ site.py ½ÃÀÛ ¸ðµâ¿¡´Â ¼³Á¤Ãë¼Ò °¡´ÉÇÑ ¼±ÅÃÀûÀÎ Äڵ尡 Æ÷ÇԵǾî Àִµ¥ ÀÌ ÄÚµå´Â ÇöÀç ·ÎÄÉÀÏ¿¡ Á¤ÀÇµÈ ÄÚµåÀüȯÀ» µû¶ó ±âº» ÄÚµåÀüȯÀ» ¼³Á¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. locale ¸ðµâÀº ¿î¿µÃ¼Á¦ ȯ°æÀÌ Á¤ÀÇÇÑ ±âº» ·ÎÄÉÀÏ ¼³Á¤À¸·ÎºÎÅÍ ÄÚµåÀüȯÀ» ÃßÃâÇϴµ¥ »ç¿ëµË´Ï´Ù (locale.py ÂüÁ¶). ¸¸¾à ÄÚµåÀüȯÀÌ °áÁ¤µÉ ¼ö ¾ø´Ù¸é, ¾Ë¼ö ¾ø°Å³ª, Áö¿øµÇÁö ¾Ê´Â´Ù¸é, ±× ÄÚµå´Â ±âº» ÄÚµåÀüȯÀ» 'ascii'·Î ±âº»¼³Á¤ÇÕ´Ï´Ù. ÀÌ Äڵ带 °¡´ÉÇÏ°Ô ÇÏ·Á¸é, site.py ÆÄÀÏÀ» ÆíÁýÇϰųª ¶Ç´Â ÀûÀýÇÑ Äڵ带 ÆÄÀ̽ãÀÇ ¼³Ä¡º»¿¡ ÀÖ´Â sitecustomize.py ¸ðµâ¾ÈÀ¸·Î Áý¾î ³ÖÀ¸¼¼¿ä.
ÆÄÀ̽ãÀº __builtins__À» ÅëÇÏ¿© »ç¿ë°¡´ÉÇÑ À¯´ÏÄÚµå ¹®ÀÚ¿µéÀ» À§Çؼ ´ÙÀ½°ú °°Àº ³»Àå ±¸¼ºÀÚ¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù:
|
u = unicode(encoded_string[,encoding=<default encoding>][,errors="strict"]) u = u'<unicode-escape·Î ÄÚµåÀüȯµÈ ÆÄÀ̽㠹®ÀÚ¿>' u = ur'<raw-unicode-escape·Î ÄÚµåÀüȯµÈ ÆÄÀ̽㠹®ÀÚ¿>' |
'À¯´ÏÄÚµå-Å»Ãâ(unicode-escape)' ÄÚµåÀüȯÀº ´ÙÀ½°ú °°ÀÌ Á¤Àǵ˴ϴÙ:
¿¡·¯¿¡ ´ëÇØ °¡´ÉÇÑ °ªµé¿¡ ´ëÇÑ ¼³¸íÀ» º¸·Á¸é ¾Æ·¡ÀÇ ÄÚµ¦ ¼½¼ÇÀ» º¸¼¼¿ä.
¿¹Á¦: u'abc' -> U+0061 U+0062 U+0063 u'\u1234' -> U+1234 u'abc\u1234\n' -> U+0061 U+0062 U+0063 U+1234 U+005c |
'raw-unicode-escape' ÄÚµåÀüȯÀº ¾Æ·¡¿Í °°ÀÌ Á¤Àǵ˴ϴÙ:
ÁÖ¸ñÇÒ °ÍÀº ÇÁ·Î±×·¥À» ÀÛ¼ºÇÒ ¶§ »ç¿ëÇÑ ±× ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ¾à°£ÀÇ ÈùÆ®¸¦ Á¦°øÇØ¾ß ÇÑ´Ù´Â °ÍÀÔ´Ï´Ù. ¼Ò½º ÆÄÀÏÀÇ Ã¹ ¸î ÁÙÀÇ ÁÖ¼®¿¡ ÇÁ·Î±×·¥ÀÇ ÇÑ ºÎºÐÀ¸·Î¼ Á¦°øÇÏ´Â °ÍÀÔ´Ï´Ù (¿¹¸¦ µé¾î, '# source file encoding: latin-1'). ¸¸¾à 7-ºñÆ® ¾Æ½ºÅ°¸¸À» »ç¿ëÇÑ´Ù¸é ¸ðµç ÀÏÀº ¹®Á¦°¡ ¾øÀ¸¸ç ±×·¯ÇÑ ÁÖÀÇ´Â ÇÊ¿ä¾ø½À´Ï´Ù, ±×·¯³ª ¸¸¾à ¾Æ½ºÅ°¿¡ Á¤ÀǵǾî ÀÖÁö ¾ÊÀº Latin-1 ¹®ÀÚµéÀ» Æ÷ÇÔÇÑ´Ù¸é, ÈùÆ®¸¦ Æ÷ÇÔÇÏ´Â °ÍÀÌ ºÐ¸íÈ÷ °¡Ä¡°¡ ÀÖ½À´Ï´Ù. ´Ù¸¥ ³ª¶óÀÇ »ç¶÷µéµµ ¿ª½Ã ¿©·¯ºÐÀÇ ¼Ò½º¿¡ ÀÖ´Â ¹®ÀÚ¿À» ÀÐÀ» ¼ö Àֱ⸦ ¹Ù¶ó±â ¶§¹®ÀÌÁö¿ä.
À¯´ÏÄÚµå °´Ã¼´Â ¸Þ½îµå .encode([encoding=<±âº» ÄÚµåÀüȯ>])¸¦ °¡Áö´Âµ¥ ÆÄÀ̽㠹®ÀÚ¿ ÄÚµåÀüȯÀ» ÁÖ¾îÁø Àü·«À» »ç¿ëÇÏ¿© À¯´ÏÄÚµå ¹®ÀÚ¿·Î ¹ÝȯÇÕ´Ï´Ù. (ÄÚµ¦À» ÂüÁ¶).
|
print u := print u.encode() # <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù; str(u) := u.encode() # <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù; repr(u) := "u%s" % repr(u.encode('unicode-escape')) |
¶ÇÇÑ, C·Î ÀÛ¼ºµÈ ´Ù¸¥ APIµéÀÌ À¯´ÏÄÚµå °´Ã¼µéÀ» ¾î¶»°Ô ´Ù·ê °ÍÀÎÁö¿¡ ´ëÇØ ´õ ÀÚ¼¼ÇÑ »çÇ×Àº ³»ºÎ Àμö ÇØ¼®°ú ¹öÆÛ ÀÎÅÍÆäÀ̽º(Internal Argument Parsing and Buffer Interface)¸¦ ÂüÁ¶Çϼ¼¿ä.
À¯´ÏÄÚµå 3.0Àº 32-ºñÆ® ¼¼ö ¹®ÀÚ ¼¼Æ®¸¦ °¡Áö¹Ç·Î, ±¸ÇöÀº 32-ºñÆ®¸¦ ÀνÄÇÏ´Â ¼¼ö º¯È¯ API¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù:
ord(u[:1]) (À̰ÍÀº È®ÀåµÇ¾î À¯´ÏÄÚµå °´Ã¼¿Í ÀÛµ¿Çϴ ǥÁØ ord() ÇÔ¼öÀÌ´Ù)
--> À¯´ÏÄÚµå ¼ø¼ ¹øÈ£ (32-ºñÆ®)
unichr(i)
--> ¹®ÀÚ i¿¡ ´ëÇÑ À¯´ÏÄÚµå °´Ã¼ (32-ºñÆ®·Î ÁÖ¾îÁø´Ù¸é);
±×·¸Áö ¾ÊÀ¸¸é ValueError
|
µÎ API´Â ¹®ÀÚ¿¿¡ ´ëÇÑ »ó´ëÀû ÇÔ¼ö ord()¿Í chr() ÇÔ¼öó·³ __builtins__·Î µé¾î°¡¾ß ÇÕ´Ï´Ù.
À¯´ÏÄÚµå´Â °³ÀÎÀûÀÎ ÄÚµåÀüȯÀ» À§ÇÑ °ø°£À» Á¦°øÇÕ´Ï´Ù. ÀÌ·¯ÇÑ °ÍµéÀ» »ç¿ëÇÏ¸é ¸Ó½Å¿¡ µû¶ó ¼·Î ´Ù¸¥ Ãâ·Â Ç¥ÇöÀ» ¾ß±âÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌ ¹®Á¦´Â ÆÄÀ̽ã ȤÀº À¯´ÏÄÚµåÀÇ ¹®Á¦°¡ ¾Æ´Ï¶ó ¸Ó½Å ¼³Á¤°ú À¯Áöº¸¼öÀÇ ¹®Á¦ÀÔ´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼´Â ±×µéÀÇ ¾Æ½ºÅ° ¹®ÀÚ¿°ú µ¿µîÇÑ ÇØ½¬ °ªÀ» ¹ÝÈ¯ÇØ¾ß ÇÕ´Ï´Ù. À¯´ÏÄÚµå ¹®ÀÚ¿ÀÌ ºñ-¾Æ½ºÅ° °ªÀ» °¡Áö°í ÀÖÀ¸¸é ±âº» ÄÚµåÀüȯµÈ, µ¿µîÇÑ ¹®ÀÚ¿ Ç¥Çö°ú, °°Àº ÇØ½¬ °ªÀ» ¹ÝȯÇÑ´Ù°í º¸ÁõÇÒ ¼ö ¾ø½À´Ï´Ù.
cmp() (ȤÀº PyObject_Compare())¸¦ »ç¿ëÇÏ¿© ºñ±³µÉ ¶§, ±× ±¸ÇöÀº ±× º¯È¯Áß¿¡ ÀϾ´Â TypeError¸¦ °¨Ãß¾î¼ ±× ¹®ÀÚ¿ÀÇ ÇàÀ§¿Í ¶È °°ÀÌ À¯ÁöµÇ¾î¾ß ÇÕ´Ï´Ù. ¹®ÀÚ¿À» À¯´ÏÄÚµå·Î °Á¦ º¯È¯ÇÏ´Â Áß¿¡ ÀϾ´Â ValueErrors¿Í °°Àº ´Ù¸¥ ¸ðµç ¿¡·¯µéÀº °¨Ãß¾îÁ®¼´Â ¾ÈµÇ°í »ç¿ëÀÚ¿¡°Ô °Ç³×Á®¾ß ÇÕ´Ï´Ù.
Æ÷ÇÔ Å×½ºÆ®¿¡¼ ('a' in u'abc' ±×¸®°í u'a' in 'abc') ¾çÂÊ Æí ¸ðµÎ ÀÌ Å×½ºÆ®¸¦ Àû¿ëÇϱâ Àü¿¡ °Á¦·Î À¯´ÏÄÚµå·Î º¯È¯µÇ¾î¾ß ÇÕ´Ï´Ù. °Á¦ º¯È¯µµÁß¿¡ ¹ß»ýÇÏ´Â ¿¡·¯(¿¹. None in u'abc')µéÀº °¨Ãß¾îÁö¸é ¾ÈµË´Ï´Ù.
u + s := u + unicode(s) s + u := unicode(s) + u |
°ü·ÃµÈ ¸ðµç ¹®ÀÚ¿À» À¯´ÏÄÚµå·Î º¯È¯ÇÏ°í ±× ÀμöµéÀ» °°Àº À̸§À» °¡Áø À¯´ÏÄÚµå ¸Þ½îµå¿¡ Àû¿ëÇÔÀ¸·Î½á, ¸ðµç ¹®ÀÚ¿ ¸Þ½îµåµéÀº È£ÃâÀ» µ¿µîÇÑ À¯´ÏÄÚµå °´Ã¼ ¸Þ½îµå È£Ãâ¿¡ À§ÀÓÇÏ¿©¾ß ÇÕ´Ï´Ù
¿¹¸¦ µé¾î, string.join((s,u),sep) := (s + sep) + u sep.join((s,u)) := (s + sep) + u |
À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÑ %-Çü½ÄÈ Àаí/¾²±â´Â, Çü½ÄÈ Ç¥½Ä¼³Á¤ÀÚ(Formatting Markers)¸¦ ÂüÁ¶Çϼ¼¿ä.
UnicodeError´Â ¿¹¿Ü¸ðµâ¿¡ ValueErrorÀÇ ÇϺÎŬ·¡½º·Î Á¤ÀǵǾî ÀÖ½À´Ï´Ù. ±×°ÍÀº PyExc_UnicodeError¸¦ ÅëÇÏ¿© C ¼öÁØ¿¡¼ »ç¿ë°¡´ÉÇÕ´Ï´Ù. À¯´ÏÄÚµå ÄÚµåÀüȯ/ÄÚµåÇØ¼®¿¡ °ü·ÃµÈ ¸ðµç ¿¹¿ÜµéÀº UnicodeErrorÀÇ ÇϺÎŬ·¡½º¿©¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ (ÄÚµ¦ ÀÎÅÍÆäÀ̽º Á¤ÀÇ ÂüÁ¶) Ž»ö ÀúÀå¼Ò´Â "codecs"¸ðµâ¿¡ ÀÇÇØ¼ ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù:
codecs.register(search_function)
Ž»ö ÇÔ¼ö´Â ÇϳªÀÇ Àμö¸¦ ¿¹»óÇÕ´Ï´Ù, ¸ðµÎ ¼Ò¹®ÀÚ·ÎµÈ ÄÚµåÀüȯ À̸§À» ÃëÇϴµ¥ ÇÏÀÌǰú °ø¹éÀº ¹ØÁÙ¹®ÀÚ·Î º¯È¯µÇ°í, ´ÙÀ½°ú °°Àº ÀμöµéÀ» ÃëÇÏ´Â ÇÔ¼öµé(encoder, decoder, stream_reader, stream_writer)À» ´ãÀº ÅÍÇÃÀ» ¹ÝȯÇÕ´Ï´Ù:
|
Ž»ö ÇÔ¼ö°¡ ÁÖ¾îÁø ÄÚµåÀüȯÀ» ¹ß°ßÇÒ ¼ö ¾øÀ» °æ¿ì¿¡´Â, NoneÀ» ¹ÝÈ¯ÇØ¾ß ÇÕ´Ï´Ù.
ÄÚµåÀüȯ¿¡ ´ëÇÑ º°¸í Áö¿øÀÌ ±× Ž»ö(search) ÇÔ¼ö¿¡ ³²°ÜÁ® ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ ¸ðµâÀº ¼öÇà¼ÓµµÀÇ ÀÌÀ¯ ¶§¹®¿¡ ÄÚµåÀüȯÀÇ ÀÓ½ÃÀúÀåÀ» À¯ÁöÇÒ °ÍÀÔ´Ï´Ù. ÄÚµåÀüȯÀº ¸ÕÀú ÀÓ½ÃÀúÀåÀå¼Ò¿¡¼ ÂüÁ¶µË´Ï´Ù. ¹ß°ßµÇÁö ¾ÊÀ¸¸é, µî·ÏµÈ Ž»ö ÇÔ¼öµéÀÇ ¸®½ºÆ®°¡ °Ë»öµË´Ï´Ù. ¾Æ¹«·± ÄÚµ¦ ÅÍÇõµ ¹ß°ßµÇÁö ¾ÊÀ¸¸é, LookupError°¡ ÀϾ´Ï´Ù. ±×·¸Áö ¾ÊÀ¸¸é, ±× ÄÚµ¦ ÅÍÇÃÀº ±× Àӽñâ¾ïÀå¼Ò¿¡ ÀúÀåµÇ°í È£ÃâÀÚ¿¡°Ô ¹ÝȯµË´Ï´Ù.
Codec ½Çü¸¦ ¿¶÷ÇÏ·Á¸é ´ÙÀ½ÀÇ API°¡ »ç¿ëµÇ¾ß ÇÕ´Ï´Ù:
codecs.lookup(encoding)
À̰ÍÀº ¹ß°ßµÈ ÄÚµ¦ ÅÍÇÃÀ» ¹ÝȯÇÏ´ø°¡ LookupError¸¦ ÀÏÀ¸Åµ´Ï´Ù.
Ç¥ÁØ ÄÚµ¦Àº Ç¥ÁØ ÆÄÀ̽ã ÄÚµå ¶óÀ̺귯¸®¿¡ ÀÖ´Â encodings/ ÆÐŰÁö µð·ºÅ丮 ¾È¿¡ ÀÖ¾î¾ß ÇÕ´Ï´Ù. ±× µð·ºÅ丮¿¡ ÀÖ´Â __init__.py ÆÄÀÏÀº ÄÚµ¦ ÂüÁ¶ ȣȯ Ž»ö ÇÔ¼ö¸¦ Æ÷ÇÔÇÏ¿©¾ß¸¸ Çϴµ¥ ÀÌ´Â ÄÚµ¦ ÂüÁ¶¿¡ ±âÃÊÇÏ¿© ´À¸° ¸ðµâÀ» ±¸ÇöÇÕ´Ï´Ù.
ÆÄÀ̽ãÀº °¡Àå ÀûÀýÇÑ ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ´Ù¼öÀÇ Ç¥ÁØ ÄÚµ¦À» Á¦°øÇÏ¿©¾ß ÇÕ´Ï´Ù, ¿¹¸¦ µé¾î,
'utf-8': 8-ºñÆ® °¡º¯ ±æÀÌ ÄÚµåÀüȯ 'utf-16': 16-ºñÆ® °¡º¯ ±æÀÌ ÄÚµåÀüȯ (ÀÛÀº°ª/Å«°ª Á¾·áÇü) 'utf-16-le': utf-16ÀÌÁö¸¸ ¸í½ÃÀûÀ¸·Î ÀÛÀº°ª Á¾·áÇü 'utf-16-be': utf-16ÀÌÁö¸¸ ¸í½ÃÀûÀ¸·Î Å«°ª Á¾·áÇü 'ascii': 7-ºñÆ® ¾Æ½ºÅ° ÄÚµåÆäÀÌÁö 'iso-8859-1': ISO 8859-1 (Latin 1) ÄÚµåÆäÀÌÁö 'unicode-escape': Á¤ÀÇ´Â À¯´ÏÄÚµå ±¸¼ºÀÚ¸¦ º¸¼¼¿ä 'raw-unicode-escape': Á¤ÀÇ´Â À¯´ÏÄÚµå ±¸¼ºÀÚ¸¦ º¸¼¼¿ä 'native': ÆÄÀ̽ãÀÌ »ç¿ëÇÏ´Â ³»ºÎ Çü½ÄÀ» ´ýÇÁÇÑ´Ù |
ÀϹÝÀûÀÎ º°¸íµµ ±âº»°ª¸¶´Ù Á¦°øµÇ¾î¾ß ÇÕ´Ï´Ù, ¿¹¸¦ µé¾î, 'iso-8859-1'¿¡ ´ëÇØ¼´Â 'latin-1'ÀÌ Á¦°øµÇ¾î¾ß ÇÔ.
ÁÖÀÇ: 'utf-16'Àº ÆÄÀÏ ÀÔ/Ãâ·Â¿¡ ´ëÇØ¼´Â ¹ÙÀÌÆ® ¼ø¼ Ç¥½Ä¼³Á¤(BOM)À» ÇʼöÀûÀ¸·Î »ç¿ëÇÏ¿© ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù.
¾Æ½Ã¾Æ ½ºÅ©¸³Æ®¸¦ Áö¿øÇÏ´Â CJK ÄÚµå Àüȯ °°Àº ´Ù¸¥ ¸ðµç ÄÚµå ÀüȯÀº ÇÙ½É ÆÄÀ̽㠹èÆ÷º»¿¡ Æ÷ÇÔµÇÁö ¾Ê´Â º°°³ÀÇ ÆÐŰÁö·Î ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù. ±×¸®°í À̰ÍÀº ÀÌ Á¦¾È¼¿¡¼ ´Ù·ç´Â ºÎºÐÀÌ ¾Æ´Õ´Ï´Ù.
´ÙÀ½ÀÇ ±âº» Ŭ·¡½º°¡ "codecs"¸ðµâ ¿¡¼ Á¤ÀǵǾî¾ß ÇÕ´Ï´Ù. ÄÚµåÀüȯ ¸ðµâ ±¸ÇöÀÚ°¡ »ç¿ëÇÒ Àӽà Àå¼Ò¸¦ Á¦°øÇÒ »Ó¸¸ ¾Æ´Ï¶ó, ¶ÇÇÑ ±× À¯´ÏÄÚµå ±¸ÇöÀÌ ¿¹»óÇϰí ÀÖ´Â ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÕ´Ï´Ù.
ÁÖ¸ñÇÒ °ÍÀº ¿©±â¿¡¼ Á¤ÀÇµÈ ÄÚµ¦ ÀÎÅÍÆäÀ̽º´Â ±¤¹üÀ§ÇÑ ¾îÇø®ÄÉÀ̼ǿ¡ ¾ÆÁÖ Àß ¸Â½À´Ï´Ù. À¯´ÏÄÚµå ±¸ÇöÀº, .encode()¿Í .write()¿¡ ´ëÇØ¼´Â À¯´ÏÄÚµå °´Ã¼°¡ ÀÔ·ÂµÉ °ÍÀ¸·Î °£ÁÖÇϰí .decode()¿¡ ´ëÇØ¼´Â ¹®ÀÚ ¹öÆÛ ȣȯ °´Ã¼°¡ ÀÔ·ÂµÉ °ÍÀ¸·Î ¿¹»óÇÕ´Ï´Ù. ÇÔ¼ö .encode()¿Í .read()ÀÇ Ãâ·ÂÀº ÆÄÀ̽㠹®ÀÚ¿ÀÌ µÇ¾î¾ß ÇÕ´Ï´Ù. ±×¸®°í .decode()´Â ¹Ýµå½Ã À¯´ÏÄÚµå °´Ã¼¸¦ ¹ÝÈ¯ÇØ¾ß ÇÕ´Ï´Ù.
¸ÕÀú, ¿ì¸®´Â »óÅÂÁ¤º¸¾ø´Â ÄÚµåÀüȯ±â/ÄÚµåÇØ¼®±â°¡ ÀÖ½À´Ï´Ù. À̰ÍÀº (¾Æ·¡¿¡ ÀÖ´Â) ½ºÆ®¸² ÄÚµ¦Ã³·³ ÇÑ ¹ø¿¡ ÀÏÀ» ó¸®ÇÏÁö´Â ¾Ê½À´Ï´Ù, ¿Ö³ÄÇÏ¸é ¸ðµç ±¸¼º¿ä¼ÒµéÀÌ ¸Þ¸ð¸®¿¡¼¸¸ »ç¿ë°¡´ÉÇÏ´Ù°í ¿¹»óµÇ±â ¶§¹®ÀÔ´Ï´Ù.
class Codec:
""" »óÅÂÁ¤º¸ ¾ø´Â ÄÚµåÀüȯ±â/ÄÚµåÇØ¼®±â¿¡ ´ëÇÑ ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÑ´Ù.
.encode()/.decode() ¸Þ½îµå´Â ¿¡·¯ ÀμöµéÀ» Á¦°øÇÔÀ¸·Î½á
´Ù¸¥ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇÒ ¼öµµ ÀÖ´Ù
ÀÌ·¯ÇÑ ¹®ÀÚ¿ °ªµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ:
'strict' - ¿¡·¯¸¦ ÀÏÀ¸Å²´Ù (or a subclass)
'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í ´ÙÀ½ ¹®ÀÚ¸¦ °è¼ÓÇÏ¿© ó¸®ÇÑ´Ù
'replace' - Àû´çÇÑ ´ëü ¹®ÀÚ·Î ¹Ù²Û´Ù;
ÆÄÀ̽ãÀº ³»Àå À¯´ÏÄÚµå ÄÚµ¦¿¡ ´ëÇÏ¿© °ø½ÄÀûÀÎ
U+FFFD ´ëü ¹®ÀÚ(REPLACEMENT CHARACTER)¸¦
»ç¿ëÇÒ °ÍÀÌ´Ù.
"""
def encode(self,input,errors='strict'):
""" ±× °´Ã¼ÀÇ ÀÔ·ÂÀ» ÄÚµå ÀüȯÇϰí
ÅÍÇÃ(Ãâ·Â °´Ã¼, ó¸®µÈ ±æÀÌ)À» ¹ÝȯÇÑ´Ù .
errors´Â Àû¿ëÇØ¾ßÇÒ ¿¡·¯ 󸮸¦ Á¤ÀÇÇÑ´Ù.
±âº» °ªÀº 'strict' ó¸®ÀÌ´Ù.
ÀÌ ¸Þ½îµå´Â »óÅÂÁ¤º¸¸¦ ÄÚµ¦ ½Çü¿¡ ÀúÀåÇÏÁö ¾ÊÀ» ¼öµµ ÀÖ´Ù.
È¿À²ÀûÀ¸·Î ÄÚµåÀüȯ/ÄÚµåÇØ¼®À» Çϱâ À§Çؼ´Â
»óÅÂÁ¤º¸¸¦ ÀúÀåÇØ¾ß¸¸ ÇÏ´Â ÄÚµ¦À¸·Î
StreamCodecÀ» »ç¿ëÇ϶ó.
"""
...
def decode(self,input,errors='strict'):
""" °´Ã¼ÀÇ ÀÔ·ÂÀ» ÄÚµåÇØ¼®Çϰí ÅÍÇÃÀ» ¹ÝȯÇÑ´Ù
(Ãâ·Â °´Ã¼, ó¸®µÈ ±æÀÌ).
inputÀº ¹Ýµå½Ã bf_getreadbuf ¹öÆÛ ½½·ÔÀ» Á¦°øÇÏ´Â °´Ã¼¶ó¾ß ÇÑ´Ù
ÀÌ ½½·ÔÀ» Á¦°øÇÏ´Â °´Ã¼ÀÇ ¿¹µéÀº
ÆÄÀ̽㠹®ÀÚ¿, ¹öÆÛ °´Ã¼ ±×¸®°í ¸Þ¸ð¸® ¦Áþ±âµÈ ÆÄÀϵéÀÌ´Ù.
errors´Â Àû¿ëÇÒ ¿¡·¯ 󸮸¦ Á¤ÀÇÇÑ´Ù.
±âº»°ªÀ¸·Î 'strict' ó¸®ÀÌ´Ù.
ÀÌ ¸Þ½îµå´Â ÄÚµ¦ ½Çü¿¡ »óŸ¦ ÀúÀåÇÏÁö ¾ÊÀ» ¼öµµ ÀÖ´Ù.
ÄÚµåÇØ¼®/ÄÚµåÀüȯÀ» È¿À²ÀûÀ¸·Î ÇÏ·Á¸é »óŸ¦
¹Ýµå½Ã À¯ÁöÇØ¾ß ÇÏ´Â ÄÚµ¦À¸·Î StreamCodecÀ» »ç¿ëÇ϶ó.
"""
...
|
½ºÆ®¸²Ãâ·Â±â(StreamWriter)¿Í ½ºÆ®¸²Ãâ·Â±â(StreamReader)´Â »óÅÂÁ¤º¸ ÀÖ´Â ½ºÆ®¸²¿¡ ´ëÇÏ¿© ÀÛµ¿ÇÏ´Â ÄÚµåÀüȯ±â/ÄÚµåÇØ¼®±â(encoders/decoders)¿¡ ´ëÇÏ¿© ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÕ´Ï´Ù. ÀÌ·¸°Ô ÇÏ¸é ±× µ¥ÀÌŸ¸¦ Çѹø¿¡ ó¸®ÇÏ¿© È¿À²ÀûÀ¸·Î ¸Þ¸ð¸®¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ¸¸¾à ¾öû Å« ¹®ÀÚ¿ÀÌ ¸Þ¸ð¸®¿¡ ÀÖ´Ù¸é, ±×°ÍµéÀ» cStringIO °´Ã¼·Î Æ÷ÀåÇÏ°í ½ÍÀ¸½Ç ÅÙµ¥ ±×·¯¸é ÀÌ ÄÚµ¦À» »ç¿ëÇϼ¼¿ä. Àϰý 󸮵µ ÇÒ ¼ö ÀÖÀ» »Ó¸¸ ¾Æ´Ï¶ó, ¿¹¸¦ µé¾î, »ç¿ëÀÚ¿¡°Ô ó¸® Á¤º¸µµ Á¦°øÇÒ ¼ö ÀÖ½À´Ï´Ù.
class StreamWriter(Codec):
def __init__(self,stream,errors='strict'):
""" StreamWriter ½Çü¸¦ »ý¼ºÇÑ´Ù.
½ºÆ®¸²Àº ÆÄÀÏ-ºñ½ÁÇÑ °´Ã¼·Î¼ (ÀÌÁø) µ¥ÀÌŸ¸¦
¾²±â À§ÇÏ¿© ¿·ÁÁø´Ù.
StreamWrite´Â ¿¡·¯ Ű¿öµå Àμö¸¦ Á¦°øÇÔÀ¸·Î½á
´Ù¸¥ ¹æ½ÄÀÇ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇÒ ¼ö ÀÖ´Ù.
ÀÌ·¯ÇÑ ¸Å°³º¯¼öµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ:
'strict' - ValueError (¶Ç´Â ÇϺÎŬ·¡½º)¸¦ ÀÏÀ¸Å²´Ù
'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í °è¼ÓÇÏ¿© ´ÙÀ½ ¹®ÀÚ¸¦ ó¸®ÇÑ´Ù
'replace'- ÀûÀýÇÑ ´ëü ¹®ÀÚ·Î ¹Ù²Û´Ù
"""
self.stream = stream
self.errors = errors
def write(self,object):
""" °´Ã¼ÀÇ ³»¿ëÀ» ÄÚµåÀüȯÇÏ¿© self.stream¿¡ ¾´´Ù.
"""
data, consumed = self.encode(object,self.errors)
self.stream.write(data)
def writelines(self, list):
""" ¹®ÀÚ¿µéÀ» ´ãÀº ¿¬°á ¸®½ºÆ®¸¦ ½ºÆ®¸²¿¡
.write()¸¦ »ç¿ëÇÏ¿© ¾´´Ù.
"""
self.write(''.join(list))
def reset(self):
""" »óÅÂÁ¤º¸¸¦ À¯ÁöÇϴµ¥ »ç¿ëÇÏ´ø ÄÚµ¦ ¹öÆÛ¸¦ û¼ÒÇÏ°í »õ·ÎÀÌ ¼³Á¤ÇÑ´Ù.
ÀÌ ¸Þ½îµå¸¦ È£ÃâÇÏ´Â ÀÌÀ¯´Â Ãâ·Â »óÅ¿¡ ÀÖ´Â µ¥ÀÌŸ°¡
±ú²ýÇÑ »óÅ·ΠµÈ °ÍÀ» È®ÀÎÇÏ´Â °ÍÀ̸ç,
ÀÌ·¸°Ô ÇÔÀ¸·Î½á, »óÅÂÁ¤º¸¸¦ ȸº¹Çϱâ À§ÇÏ¿©
±× Àüü ½ºÆ®¸²À» À玻öÇÒ ÇÊ¿ä¾øÀÌ
»õ·ÎÀÌ ½Å¼±ÇÑ µ¥ÀÌŸ¸¦ Ãß°¡ÇÒ ¼ö ÀÖ´Ù.
"""
pass
def __getattr__(self,name,
getattr=getattr):
""" ´Ù¸¥ ¸ðµç ¸Þ½îµåµéÀº ÇϺÎÀÇ ½ºÆ®¸²À¸·ÎºÎÅÍ »ó¼Ó¹Þ´Â´Ù.
"""
return getattr(self.stream,name)
class StreamReader(Codec):
def __init__(self,stream,errors='strict'):
""" StreamReader ½Çü¸¦ »ý¼ºÇÑ´Ù.
½ºÆ®¸²Àº Àбâ (ÀÌÁø) µ¥ÀÌŸ·Î ¿·ÁÁø
ÆÄÀÏ-ºñ½ÁÇÑ °´Ã¼¿©¾ß ÇÑ´Ù.
½ºÆ®¸²ÀԷ±â(StreamReader)´Â errors Ű¿öµå Àμö¸¦
Á¦°øÇÔÀ¸·Î½á ´Ù¸¥ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇØµµ ÁÁ´Ù.
ÀÌ·¯ÇÑ ¸Å°³º¯¼öµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ:
'strict' - ValueError¸¦ ÀÏÀ¸Å²´Ù (or a subclass)
'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í ´ÙÀ½À» °è¼ÓÇÏ¿© ó¸®ÇÑ´Ù
'replace'- ÀûÀýÇÑ ´ëü ¹®ÀÚ·Î ´ëüÇÑ´Ù;
"""
self.stream = stream
self.errors = errors
def read(self,size=-1):
""" ½ºÆ®¸² self.streamÀ¸·ÎºÎÅÍ ÄÚµåÇØ¼®ÇÑ´Ù
±×¸®°í ±× °á°ú °´Ã¼¸¦ ¹ÝȯÇÑ´Ù.
size´Â ÄÚµåÇØ¼®À» À§ÇÏ¿© ±× ½ºÆ®¸²À¸·ÎºÎÅÍ Àоî¾ß ÇÒ
ÃÖ´ëÀÇ ¹ÙÀÌÆ® °³¼ö¸¦ ³ªÅ¸³½´Ù. ÄÚµåÇØ¼®±â´Â ÀÌ ¼³Á¤À»
ÀûÀýÇÏ°Ô º¯°æÇÒ ¼ö ÀÖ´Ù. ±âº» °ªÀº -1Àε¥
°¡´ÉÇÑ ÃÖ´ë·Î Àаí ÄÚµåÇØ¼®Ç϶ó´Â °ÍÀ» ÀǹÌÇÑ´Ù.
sizeÀÇ ¸ñÀûÀº °Å´ëÇÑ ÆÄÀÏÀ» ÇÑ ¹ø¿¡ ÄÚµåÇØ¼®ÇÏ´Â °ÍÀ»
¸·´Â °ÍÀÌ´Ù.
ÀÌ ¸Þ½îµå´Â ¿å½É²¯ Àбâ Àü·«À» »ç¿ëÇÏ¿©¾ß Çϴµ¥
´Ù½Ã¸»Çϸé ÁÖ¾îÁø Å©±â¿Í ÄÚµåÀüȯÀÇ Á¤ÀǾȿ¡¼
ÃÖ´ëÇÑÀÇ µ¥ÀÌŸ¸¦ Àоî¾ß¸¸ ÇÑ´Ù´Â °ÍÀÌ´Ù.
¿¹¸¦ µé¾î, ¸¸¾à ¼±ÅÃÀûÀÎ ÄÚµåÀüȯ Á¾·áÇ¥½Ã ȤÀº »óÅÂÁ¤º¸ Ç¥½Ä¼³Á¤ÀÌ
±× ½ºÆ®¸²¿¡ »ç¿ë°¡´ÉÇÏ´Ù¸é, ÀÌ°Íµé ¿ª½Ã ÀÐÇôÁ®¾ß ÇÑ´Ù.
"""
# Á¶°¢½äÁö ¾ÊÀº Àбâ:
if size < 0:
return self.decode(self.stream.read())[0]
# Á¶°¢½ä¸° Àбâ:
read = self.stream.read
decode = self.decode
data = read(size)
i = 0
while 1:
try:
object, decodedbytes = decode(data)
except ValueError,why:
# ÀÌ ¸Þ½îµå´Â ´À¸®Áö¸¸ »ó´çÈ÷ ¸¹Àº Á¶°Ç¾Æ·¡¿¡¼
# Àß ÀÛµ¿ÇÑ´Ù; ±â²¯ÇØ¾ß 10¹øÁ¤µµ ½ÃµµÇÑ´Ù
i = i + 1
newdata = read(1)
if not newdata or i > 10:
raise
data = data + newdata
else:
return object
def readline(self, size=None):
""" ÇÑ ¶óÀÎÀ» ÀÔ·ÂÀ¸·ÎºÎÅÍ Àоî¼
ÄÚµåÇØ¼®µÈ µ¥ÀÌŸ¸¦ ¹ÝȯÇÑ´Ù.
ÁÖÀÇ: ÀÌ ¸Þ½îµå´Â, .readlines() ¸Þ½îµå¿Í´Â ´Ù¸£°Ô, ÇϺÎÀÇ
.readline() ¸Þ½îµå·ÎºÎÅÍ ¶óÀÎ ³Ñ±è Áö½ÄÀ» »ó¼Ó¹Þ´Â´Ù.
-- ÇöÀç·Î´Â ¶óÀÎ ¹öÆÛ¸µÀÇ ºÎÀç·Î ÀÎÇÏ¿© ÄÚµ¦ ÄÚµåÇØ¼®±â¸¦
»ç¿ëÇÑ ¶óÀÎ ³Ñ±èÀ» Áö¿øÇÏÁö ¾Ê´Â´Ù.
±×·¸Áö¸¸, °¡´ÉÇÏ´Ù¸é, ÇϺÎŬ·¡½ºµéÀº ÀÚ½ÅÀÌ °¡Áø ¶óÀÎ ³Ñ±èÁö½ÄÀ»
»ç¿ëÇÏ¿© ÀÌ ¸Þ½îµå¸¦ ±¸ÇöÇϵµ·Ï ³ë·ÂÇØ¾ß ÇÑ´Ù.
Å©±â(size)´Â, ÁÖ¾îÁø´Ù¸é, ±× ½ºÆ®¸²ÀÇ .readline() ¸Þ½îµå¿¡
Å©±â Àμö·Î °Ç³×Áø´Ù.
"""
if size is None:
line = self.stream.readline()
else:
line = self.stream.readline(size)
return self.decode(line)[0]
def readlines(self, sizehint=0):
""" ÀÔ·Â ½ºÆ®¸²¿¡¼ °¡´ÉÇÑ ¸ðµç ¶óÀÎÀ» Àд´Ù
±×¸®°í ±×°ÍµéÀ» ¶óÀÎÀÇ ¸®½ºÆ®·Î ¹ÝȯÇÑ´Ù.
ÁÙ ³Ñ±èÀº ÄÚµ¦ÀÇ ÄÚµåÇØ¼® ¸Þ½îµå¸¦ »ç¿ëÇÏ¿©
±¸ÇöµÇ°í ±× ¸®½ºÆ® Ç׸ñ¿¡ Æ÷ÇԵȴÙ.
Å©±â ÈùÆ®(sizehint)´Â, ÁÖ¾îÁø´Ù¸é, ±× ½ºÆ®¸²ÀÇ .read()
¸Þ½îµå¿¡ Å©±â Àμö·Î °Ç³×Áø´Ù.
"""
if sizehint is None:
data = self.stream.read()
else:
data = self.stream.read(sizehint)
return self.decode(data)[0].splitlines(1)
def reset(self):
""" »óÅÂÁ¤º¸¸¦ À¯ÁöÇϴµ¥ »ç¿ëµÈ ÄÚµ¦ ¹öÆÛ¸¦ Àç¼³Á¤ÇÑ´Ù.
½ºÆ®¸² À§Ä¡ ¼³Á¤ÀÌ ´Ù½Ã ÀϾ¸é ¾ÈµÇ¹Ç·Î ÁÖÀÇÇ϶ó.
ÀÌ ¸Þ½îµåÀÇ ¸ñÀûÀº ÁÖ·Î ÄÚµåÇØ¼®ÁßÀÇ ¿¡·¯·ÎºÎÅÍ
º¹±¸¸¦ ÇÏ´Â °ÍÀÌ´Ù.
"""
pass
def __getattr__(self,name,
getattr=getattr):
""" ´Ù¸¥ ¸ðµç ¸Þ½îµå´Â ÇϺÎÀÇ ½ºÆ®¸²À¸·ÎºÎÅÍ »ó¼Ó¹Þ´Â´Ù.
"""
return getattr(self.stream,name)
|
½ºÆ®¸² ÄÚµ¦ ±¸ÇöÀÚ´Â ÀÚÀ¯·Ó°Ô ½ºÆ®¸²Ãâ·Â±â¿Í ½ºÆ®¸²ÀԷ±â ÀÎÅÍÆäÀ̽º¸¦ ÇϳªÀÇ Å¬·¡½º¿¡ °áÇÕÇÒ ¼ö ÀÖ½À´Ï´Ù. ½ÉÁö¾î´Â ÀÌ ¸ðµç °ÍµéÀ» ÄÚµ¦ Ŭ·¡½º¿Í °áÇÕÇÏ´Â °Íµµ °¡´ÉÇÕ´Ï´Ù.
±¸ÇöÀÚµéÀº ÀÚÀ¯·Ó°Ô Ãß°¡ÀûÀÎ ¸Þ½îµåµéÀ» Ãß°¡ÇÏ¿© ÄÚµ¦ÀÇ ±â´ÉÀ» °³¼±Çϰųª ¸Þ½îµåµéÀÌ ÀÛµ¿Çϴµ¥ ÇÊ¿äÇÑ ¿©ºÐÀÇ »óÅ Á¤º¸¸¦ Á¦°øÇÒ ¼ö ÀÖ½À´Ï´Ù. ±×·¸Áö¸¸, ³»ºÎ ÄÚµ¦ ±¸ÇöÀº À§ÀÇ ÀÎÅÍÆäÀ̽º¸¸À» »ç¿ëÇÒ °ÍÀÔ´Ï´Ù.
À¯´ÏÄÚµå ±¸ÇöÀÌ ÀÌ·¯ÇÑ ±âº» Ŭ·¡½ºµéÀ» »ç¿ëÇØ¾ß ÇÏ´Â °ÍÀº ¾Æ´Õ´Ï´Ù, ¿ÀÁ÷ ÀÎÅÍÆäÀ̽º¸¸ ÀÏÄ¡ÇÏ¸é µË´Ï´Ù; À̰ÍÀ¸·Î ÄÚµ¦À» È®Àå ÇüÀ¸·Î ÀÛ¼ºÇÒ ¼ö ÀÖ½À´Ï´Ù.
ÇѰ輱À¸·Î¼, °Å´ëÇÑ Â¦Áþ±â Å×À̺íÀº Á¤Àû C µ¥ÀÌŸ¸¦ »ç¿ëÇÏ¿© º°°³ÀÇ (°øÀ¯) È®Àå ¸ðµâ·Î ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù. ±×·¸°Ô ÇÏ¸é ¿©·¯ ÇÁ·Î¼¼½ºµéÀÌ °°Àº µ¥ÀÌŸ¸¦ °øÀ¯ÇÒ ¼ö ÀÖ½À´Ï´Ù.
À¯´ÏÄÚµå ¦Áþ±â ÆÄÀÏÀ» ¦Áþ±â ¸ðµâ·Î ÀÚµ¿-º¯È¯ÇÏ´Â µµ±¸°¡ Á¦°øµÇ¾î Ãß°¡ÀûÀΠ¦Áþ±â¿¡ ´ëÇÑ Áö¿øÀ» °£´ÜÇÏ°Ô ÇÏ¿© ÁÖ¾î¾ß ÇÕ´Ï´Ù (References ÂüÁ¶).
.split() ¸Þ½îµå´Â À¯´ÏÄڵ忡¼ ¾î¶² °ÍÀÌ °ø¹éÀ¸·Î °£ÁֵǴÂÁö¸¦ ¾Ë¾Æ¾ß ÇÒ Çʿ䰡 ÀÖÀ» °ÍÀÔ´Ï´Ù.
´ë¼Ò¹®ÀÚ º¯È¯Àº À¯´ÏÄÚµå µ¥ÀÌŸ¿¡´Â ¾à°£ º¹ÀâÇÕ´Ï´Ù, ¿Ö³ÄÇÏ¸é µ¹¾Æº¸¾Æ¾ß ÇÒ ¸¹Àº ´Ù¸¥ Á¶°ÇµéÀÌ Àֱ⠶§¹®ÀÔ´Ï´Ù. ´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr21/
´ë¼Ò¹®ÀÚ º¯È¯À» ±¸ÇöÇÏ´Â °Í¿¡ ´ëÇÑ ¾à°£ÀÇ °¡À̵å¶óÀÎÀ» º¸½Ç ¼ö ÀÖ½À´Ï´Ù.
ÆÄÀ̽㿡 ´ëÇÏ¿©, ¿ì¸®´Â À¯´ÏÄڵ忡 Æ÷ÇÔµÈ 1-1 º¯È¯¸¸À» ±¸ÇöÇØ¾ß ÇÕ´Ï´Ù. ·ÎÄÉÀÏ¿¡ ÀÇÁ¸ÀûÀÎ ±×¸®°í ´Ù¸¥ Ư¼öÇÑ ´ë¼Ò¹®ÀÚ º¯È¯Àº (´ÙÀ½À» ÂüÁ¶: Unicode standard file SpecialCasing.txt) »ç¿ëÀÚÀÇ ¿µ¿ª ·çƾ¿¡ ³²°ÜÁ®¾ß Çϸç ÀÎÅÍÇÁ¸®ÅÍÀÇ Çٽɿ¡ µé¾î°¡Áö ¾Ê¾Æ¾ß ÇÕ´Ï´Ù.
¸Þ½îµå .capitalize()¿Í .iscapitalized()´Â À§ÀÇ ±â¼úÀû º¸°í¼¿¡ Á¤ÀÇµÈ ´ë¼Ò¹®ÀÚ Â¦Áþ±â ¾Ë°í¸®ÁòÀ» °¡´ÉÇÑÇÑ ±ÙÁ¢ÇÏ°Ô µû¶ó¾ß ÇÕ´Ï´Ù.
ÁÙ ¹Ù²ÞÀº CRLF, CR, LF¿ÍÀÇ Á¶ÇÕ°ú ÇÔ²² B ¼Ó¼ºÀ» °¡Áö´Â ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ¿µé°ú Ç¥ÁØ¿¡¼ Á¤ÀÇµÈ ´Ù¸¥ Ư¼öÇÑ ¶óÀÎ ºÐ¸®Àڵ鿡 ´ëÇÏ¿© (±× ¼ø¼·Î ¹ø¿ªµÇ¾î) ¼öÇàµÇ¾î¾ß ÇÕ´Ï´Ù. ´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr13/ÁÙ ¹Ù²Þ°ú »õ·Î¿î¶óÀÎ(newline) 󸮿¡ °üÇÑ ¾à°£ÀÇ °¡À̵å¶óÀÎÀ» º¸½Ç ¼ö ÀÖ½À´Ï´Ù.
Unicode ÇüÀº .splitlines() ¸Þ½îµå¸¦ Á¦°øÇØ¾ß Çϴµ¥ À̰ÍÀº À§ÀÇ ÁöÁ¤¿¡ µû¸¥ ¶óÀÎÀÇ ¸®½ºÆ®¸¦ ¹ÝȯÇÕ´Ï´Ù. À¯´ÏÄÚµå ¸Þ½îµåµéÀ» ÂüÁ¶Çϼ¼¿ä.
º°°³ÀÇ "unicodedata" ¸ðµâÀº Ç¥ÁØ UnicodeData.txt ÆÄÀÏ¿¡ Á¤ÀÇµÈ ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ ¼Ó¼ºµé¿¡ ´ëÇÏ¿© °£·«ÇÑ ÀÎÅÍÆäÀ̽º¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù.
¹«¾ùº¸´Ùµµ, ÀÌ·¯ÇÑ ¼Ó¼ºµéÀº ¼ýÀÚ, ±âÈ£, °ø¹é(numbers, digits, spaces, whitespace) µîµîÀ» ÀνÄÇÏ´Â ¹æ¹ýÀ» Á¦°øÇÕ´Ï´Ù.
ÀÌ ¸ðµâÀº ¸ðµç À¯´ÏÄÚµå ¹®Àڵ鿡 ´ëÇÑ Á¢±ÙÀ» Á¦°øÇØ¾ß ÇÒ °ÍÀ̹ǷÎ, °á±¹ ÀÌ ¸ðµâÀÌ Æ÷ÇÔÇØ¾ßÇÒ µ¥ÀÌŸ´Â UnicodeData.txt¿¡ Àִµ¥ ´ë·« 600kB³ª Á¡À¯ÇÒ °ÍÀÔ´Ï´Ù. ÀÌ·¯ÇÑ ÀÌÀ¯·Î, ±× µ¥ÀÌŸ´Â Á¤ÀûÀÎ(static) C µ¥ÀÌŸ·Î ÀúÀåµÇ¾î¾ß ÇÕ´Ï´Ù. ÀÌ·¸°Ô ÇÏ¸é °øÀ¯ ¸ðµâ·Î ÄÄÆÄÀÏ ÇÒ ¼ö ÀÖ¾î¼ ¹Ø¿¡ ±ò¸° ¿î¿µÃ¼Á¦°¡ ±×°ÍÀ» ÇÁ·Î¼¼½ºµé »çÀÌ¿¡ °øÀ¯ÇÒ ¼ö ÀÖ½À´Ï´Ù (º¸ÅëÀÇ ÆÄÀ̽ã ÄÚµå ¸ðµâ°ú´Â ´Ù¸¨´Ï´Ù).
ÀÌ Á¤º¸¿¡ Á¢±ÙÇϱâ À§ÇÑ Ç¥ÁØ ÆÄÀ̽ã ÀÎÅÍÆäÀ̽º°¡ ÀÖ¾î¾ß ÇÏ¸ç ±×·¸°Ô ÇØ¾ß ´Ù¸¥ ±¸ÇöÀÚµéÀÌ ÀڽŸ¸ÀÌ °¡´ÉÇÑ °³¼±µÈ ¹öÀü, ¿¹¸¦ µé¾î, ±× µ¥ÀÌŸ¸¦ ºÐÁÖÇÏ°Ô Ç®¾î³»´Â °Í°ú °°Àº ¹öÀüÀ» ²È¾Æ ³ÖÀ» ¼ö ÀÖ½À´Ï´Ù.
ÀÌ¿¡ ´ëÇÑ Áö¿øÀº »ç¿ëÀÚÀÇ ¶¥ ÄÚµ¦¿¡ ³²°ÜÁ® ÀÖÀ¸¸ç ¸í½ÃÀûÀ¸·Î ÆÄÀ̽ãÀÇ Çٽɿ¡ ÅëÇÕµÇ¾î µé¾î°¡Áö ¾Ê½À´Ï´Ù. ÁÖ¸ñÇÒ °ÍÀº ³»ºÎ Çü½ÄÀÌ ±¸ÇöµÈ ¹æ½Ä ¶§¹®¿¡ »çÀû ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ¿ÀÁ÷ \uE000¿¡¼ \uF8FF±îÁöÀÇ ¿µ¿ª¸¸ÀÌ »ç¿ë°¡´ÉÇÕ´Ï´Ù.
ÆÄÀ̽㠰´Ã¼¿¡ ´ëÇÑ ³»ºÎ Çü½ÄÀº ÆÄÀ̽㿡 ƯÀ¯ÇÑ °íÁ¤µÈ Çü½Ä <PythonUnicode>À» »ç¿ëÇØ¾ß ÇÕ´Ï´Ù. À̰ÍÀº 'unsigned short' ¶Ç´Â (16 ºñÆ®¸¦ °¡Áö´Â ¶Ç ´Ù¸¥ ºÎÈ£¾ø´Â ¼öÄ¡ Çü)À¸·Î ±¸ÇöµÇ¾î ÀÖ½À´Ï´Ù . ¹ÙÀÌÆ® ¼ø¼´Â Ç÷§Æû¿¡ ÀÇÁ¸ÀûÀÔ´Ï´Ù.
ÀÌ Çü½ÄÀº »óÀÀÇÏ´Â À¯´ÏÄÚµå ¼¼öÀÇ UTF-16 ÄÚµåÀüȯÀ» À¯ÁöÇÒ °ÍÀÔ´Ï´Ù. ÆÄÀ̽ã À¯´ÏÄÚµå ±¸ÇöÀº ÀÌ·¯ÇÑ °ªµé¿¡ ¸¶Ä¡ UCS-2 °ªÀÎ °Íó·³ Á¢±ÙÇÕ´Ï´Ù. UCS-2¿Í UTF-16Àº ÇöÀç Á¤ÀÇµÈ ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ Æ÷ÀÎÆ®¿¡ ´ëÇÏ¿© °°½À´Ï´Ù. ´ë¸® ¾ø´Â UTF-16Àº ¾à 64k ¹®ÀÚ¿¡ ´ëÇÑ Á¢±ÙÀ» Á¦°øÇϸç À¯´ÏÄÚµåÀÇ BMP¿¡ Á¤ÀÇµÈ ¸ðµç ¹®ÀÚµéÀ» ó¸®ÇÕ´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼ ±¸¼ºÀÚ¿¡ ÄÚµ¦ÀÌ º¸³»´Â µ¥ÀÌŸ°¡ ÀÌ·¯ÇÑ °¡Á¤À» ¹Ý¿µÇϴ°¡ È®ÀÎÇÏ´Â °ÍÀº ÄÚµ¦ÀÇ Ã¥ÀÓÀÔ´Ï´Ù. À¯´ÏÄÚµå °´Ã¼ ±¸¼ºÀÚ´Â À¯´ÏÄڵ忡 ¸Â´Â µ¥ÀÌŸÀÎÁö ȤÀº ´ë¸®¸¦ »ç¿ëÇϰí ÀÖ´ÂÁö¿¡ ´ëÇÏ¿© Á¡°ËÇÏÁö ¾Ê½À´Ï´Ù.
¹Ì·¡ÀÇ ±¸ÇöÀº 32 ºñÆ® Á¦ÇÑÀ» È®ÀåÇÏ¿© UTF-16ÀÌ Á¢±ÙÇÒ ¼ö ÀÖ´Â ¸ðµç ¹®ÀÚ(¾à 1M¹®ÀÚ)¸¦ Æ÷ÇÔÇÑ ¿ÏÀüÇÑ ¼¼Æ®±îÁö È®ÀåÇÒ °ÍÀÔ´Ï´Ù.
À¯´ÏÄÚµå API´Â <PythonUnicode>¿¡¼ ÄÄÆÄÀÏ·¯ÀÇ wchar_t·ÎÀÇ ÀÎÅÍÆäÀ̽º ·çƾÀ» Á¦°øÇØ¾ß ÇÕ´Ï´Ù. wchar_t´Â compiler/libc/platformÀÌ »ç¿ëµÇ´Âµ¥ µû¶ó¼ 16 ȤÀº 32 ºñÆ®ÀÏ ¼ö ÀÖ½À´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼´Â ij½¬µÈ ÆÄÀ̽㠹®ÀÚ¿ °´Ã¼ <defenc>¿¡ ´ëÇÏ¿© Æ÷ÀÎÅ͸¦ °¡Á®¾ß ÇÕ´Ï´Ù. <defenc>´Â ±× °´Ã¼ÀÇ °ªÀ» <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÏ¿© À¯ÁöÇÕ´Ï´Ù. À̰ÍÀÌ ÇÊ¿äÇÑ ÀÌÀ¯´Â ¼öÇà¼Óµµ¿Í ³»ºÎ ÇØ¼®ÀÇ ¹®Á¦¶§¹®ÀÔ´Ï´Ù ('³»ºÎ Àμö ÇØ¼®'À» ÂüÁ¶Çϼ¼¿ä). ù¹øÂ°·Î <±âº» ÄÚµåÀüȯ>¿¡ ´ëÇÑ º¯È¯ ¿ä±¸°¡ À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿© Á¦ÃâµÇ¸é ±× ¹öÆÛ°¡ ä¿öÁý´Ï´Ù.
³»ºÎÈ´Â (ÇöÀç·Î´Â) ºÒÇÊ¿äÇÕ´Ï´Ù, ¿Ö³ÄÇÏ¸é ÆÄÀ̽㠽ĺ°ÀÚµéÀº ¿ÀÁ÷ ¾Æ½ºÅ°·Î¸¸ Á¤ÀǵǾî Àֱ⠶§¹®ÀÔ´Ï´Ù.
codecs.BOMÀº ³»ºÎÀûÀ¸·Î »ç¿ëµÇ´Â Çü½Ä¿¡ ´ëÇÏ¿© ¹ÙÀÌÆ® ¼ø¼ Ç¥½Ä¼³Á¤(BOM)À» ¹ÝÈ¯ÇØ¾ß ÇÕ´Ï´Ù. ÄÚµ¦ ¸ðµâÀº ´ÙÀ½ÀÇ »ó¼öµéÀ» ÆíÀÇ¿Í ÂüÁ¶¸¦ À§ÇÏ¿© Ãß°¡ÀûÀ¸·Î Á¦°øÇØ¾ß ÇÕ´Ï´Ù (codecs.BOMÀº Ç÷§Æû¿¡ µû¶ó¼ BOM_BE ȤÀº BOM_LE À̾î¾ß ÇÕ´Ï´Ù):
BOM_BE: '\376\377'
(Å«°ª Á¾·áÇü Ç÷§Æû¿¡¼ UTF-16À¸·Î À¯´ÏÄÚµå
U+0000FEFF¿¡ »óÀÀÇÑ´Ù == ZERO WIDTH NO-BREAK SPACE)
BOM_LE: '\377\376'
(ÀÛÀº°ª Á¾·áÇü Ç÷§Æû¿¡¼ UTF-16À¸·Î À¯´ÏÄÚµå
U+0000FFFE¿¡ »óÀÀÇÑ´Ù == ºÒ¹ýÀûÀÎ À¯´ÏÄÚµå ¹®ÀÚ·Î Á¤ÀǵȴÙ)
BOM4_BE: '\000\000\376\377'
(UCS-4·Î À¯´ÏÄÚµå U+0000FEFF¿¡ »óÀÀÇÑ´Ù)
BOM4_LE: '\377\376\000\000'
(UCS-4·Î À¯´ÏÄÚµå U+0000FFFE¿¡ »óÀÀÇÑ´Ù)
|
À¯´ÏÄÚµå´Â Å« °ª Á¾·áÇü ¹ÙÀÌÆ® ¼ø¼¸¦ "¿Ã¹Ù¸£´Ù"¶ó°í °£ÁÖÇÑ´Ù´Â °ÍÀ» ÁÖ¸ñÇϼ¼¿ä. ¹Ù²î¾îÁø ¼ø¼´Â "À߸øµÈ" Çü½Ä, Áï ºÒ¹ýÀûÀÎ ¹®ÀÚ Á¤ÀǶó°í °£Áֵ˴ϴÙ.
ȯ°æ¼³Á¤ ½ºÅ©¸³Æ®´Â ÆÄÀ̽ãÀÌ °íÀ¯ÀÇ wchar_t ÇüÀ» ¾µ ¼ö ÀÖÀ»Áö ¾øÀ» Áö °áÁ¤Çϴµ¥ µµ¿òÀ» ÁÖ¾î¾ß ÇÕ´Ï´Ù. (±×°ÍÀº 16-ºñÆ® unsigned ÇüÀ» °¡Á®¾ß ÇÕ´Ï´Ù).
bf_getcharbuf¿¡ ´ëÇØ¼´Â <defenc> ÆÄÀ̽㠹®ÀÚ¿ °´Ã¼¸¦ ±âº»À¸·Î »ç¿ëÇϰí bf_getreadbuf¿¡ ´ëÇØ¼´Â ³»ºÎ ¹öÆÛ¸¦ »ç¿ëÇÏ¿© ¹öÆÛ ÀÎÅÍÆäÀ̽º¸¦ ±¸ÇöÇÕ´Ï´Ù. ¸¸¾à bf_getcharbuf°¡ ¿ä±¸µÇ°í <defenc> °´Ã¼°¡ ¾ÆÁ÷ Á¸ÀçÇÏÁö ¾Ê´Â´Ù¸é, ¸ÕÀú ±×°ÍÀÌ »ý¼ºµË´Ï´Ù.
ÁÖ¸ñÇÒ °ÍÀº Ưº°ÇÑ »ç·Ê·Î¼, ÇØ¼®±â Ç¥½Ä¼³Á¤ÀÚÀÎ "s#"´Â (bf_getreadbuf °¡ ¹ÝȯÇÏ´Â) ¹Ì°¡°ø À¯´ÏÄÚµå UTF-16 µ¥ÀÌŸ¸¦ ¹ÝȯÇÏÁö ¾ÊÀ» °ÍÀ̶ó´Â °ÍÀÔ´Ï´Ù, ¿ÀÈ÷·Á ±× À¯´ÏÄÚµå °´Ã¼¸¦ ±âº» ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ÄÚµåÀüȯÇÏ·Á°í ³ë·ÂÇÕ´Ï´Ù ±×¸®°í³ª¼ ±× °á°ú ¹®ÀÚ¿ °´Ã¼¿¡ ´ëÇÑ Æ÷ÀÎÅ͸¦ ¹ÝȯÇÕ´Ï´Ù. (¶Ç´Â ±× º¯È¯ÀÌ ½ÇÆÐÇÏ¸é ¿¹¿Ü¸¦ ÀÏÀ¸Åµ´Ï´Ù). À̰ÍÀÇ ¸ñÀûÀº ¿ì¿¬ÇÏ°Ô ÀÌÁø µ¥ÀÌŸ¸¦ Ãâ·Â ½ºÆ®¸²¿¡ ¾²°Ô µÉ ¶§ »ó´ë¹æÀÌ ÀÎÁöÇÏÁö ¸øÇÏ´Â °ÍÀ» ¹æÁöÇϰíÀÚ ÇÏ´Â °ÍÀÔ´Ï´Ù.
À̰ÍÀÇ ÀÌÁ¡Àº Ãß°¡ÀûÀ¸·Î »ç¿ëÇØ¾ßÇÒ ÄÚµåÀüȯÀ» ÁöÁ¤ÇÒ ÇÊ¿ä¾øÀÌ (ÀüÇüÀûÀ¸·Î ÀÌ·¯ÇÑ ÀÎÅÍÆäÀ̽º¸¦ »ç¿ëÇÏ´Â) Ãâ·Â ½ºÆ®¸²¿¡ ¾µ ¼ö ÀÖ´Ù´Â °ÍÀÔ´Ï´Ù.
¸¸¾à À¯´ÏÄÚµå °´Ã¼ÀÇ Àб⠹öÆÛ ÀÎÅÍÆäÀ̽º¿¡ Á¢±ÙÇÒ Çʿ䰡 ÀÖ´Ù¸é PyObject_AsReadBuffer() ÀÎÅÍÆäÀ̽º¸¦ »ç¿ëÇϼ¼¿ä.
³»ºÎ Çü½ÄÀº '³»ºÎȵÈ-À¯´ÏÄÚµå' ÄÚµ¦À» »ç¿ëÇÏ¿©, ¿¹¸¦ µé¾î, u.encode('unicode-internal')¸¦ ÅëÇÏ¿© ¶ÇÇÑ Á¢±ÙµÉ ¼ö ÀÖ½À´Ï´Ù.
°íÀ¯ÀÇ À¯´ÏÄÚµå °´Ã¼¸¦ Áö¿øÇØ¾ß ÇÕ´Ï´Ù. ±× °´Ã¼µéÀº Ç÷§Æû¿¡ µ¶¸³ÀûÀÎ ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ÄÚµåÀüȯµÇ¾î¾ß ÇÕ´Ï´Ù.
¹è¿Çϱâ(Marshal)Àº UTF-8¸¦ »ç¿ëÇØ¾ß¸¸ Çϰí ÀýÀ̱â(Pickle)´Â (ÅØ½ºÆ® ¸ðµå¿¡¼) Raw-Unicode-Escape ¶Ç´Â (ÀÌÁø ¸ðµå¿¡¼) UTF-8À» ÄÚµåÀüȯÀ¸·Î »ç¿ëÇØ¾ß¸¸ ÇÕ´Ï´Ù. UTF-16 ´ë½Å¿¡ UTF-8À» »ç¿ëÇϸé BOM Ç¥½ÄÀ» ÀúÀåÇÒ Çʿ䰡 ¾ø´Â ÀÌÁ¡ÀÌ ÀÖ½À´Ï´Ù.
'Secret Labs AB'´Â À¯´ÏÄڵ带-ÀνÄÇÏ´Â Á¤±Ô Ç¥Çö½Ä ¸Ó½Å¿¡ ´ëÇÏ¿© ¿¬±¸Çϰí ÀÖ½À´Ï´Ù. ±× Á¤±Ô Ç¥Çö½ÄÀº Æò¹® 8-ºñÆ®, UCS-2, ±×¸®°í (¼±ÅÃÀûÀ¸·Î) UCS-4 ³»ºÎ ¹®ÀÚ ¹öÆÛ¿¡ ÀÛµ¿ÇÕ´Ï´Ù.
´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr18/
À¯´ÏÄÚµå Á¤±ÔÇ¥Çö½ÄÀ» ´Ù·ç´Â ¹ý¿¡ ´ëÇÑ Æò°¡¸¦ º¼ ¼ö ÀÖ½À´Ï´Ù.
Çü½ÄÈ Ç¥½Ä¼³Á¤ÀÚ´Â ÆÄÀ̽ãÀÇ Çü½ÄÈ ¹®ÀÚ¿¿¡ »ç¿ëµÈ´Ù. ¸¸¾à ÆÄÀ̽㠹®ÀÚ¿ÀÌ Çü½ÄÈ ¹®ÀÚ¿·Î »ç¿ëµÈ´Ù¸é, ´ÙÀ½ÀÇ ¹ø¿ªÀº È¿·ÂÀ» ¹ßÈÖÇØ¾ß ÇÕ´Ï´Ù:
'%s': À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿© À̰ÍÀº ±× Àüü Çü½ÄÈ ¹®ÀÚ¿À»
°Á¦·Î À¯´ÏÄÚµå·Î º¯È¯ÇÏ°Ô ¸¸µé °ÍÀÌ´Ù.
ÁÖ¸ñÇÒ °ÍÀº ¼öÇà¼ÓµµÀÇ ¹®Á¦ ¶§¹®¿¡ óÀ½ºÎÅÍ
À¯´ÏÄÚµå Çü½ÄÈ ¹®ÀÚ¿À» »ç¿ëÇÏ¿©¾ß ÇÑ´Ù.
|
Çü½ÄÈ ¹®ÀÚ¿ÀÌ À¯´ÏÄÚµå °´Ã¼ÀÎ °æ¿ì¿¡, ¸ðµç ¸Å°³º¯¼öµéÀº ¸ÕÀú °Á¦·Î À¯´ÏÄÚµå·Î º¯È¯µÇ°í ±×¸®°í ³ª¼ ´Ù½Ã Á¶¸³µÇ¾î ±× Çü½ÄÈ ¹®ÀÚ¿¿¡ ¸Â°Ô Çü½Äȵ˴ϴÙ. ¼öÄ¡µéÀº ¸ÕÀú ¹®ÀÚ¿·Î º¯È¯µÇ°í ´ÙÀ½¿¡ À¯´ÏÄÚµå·Î º¯È¯µË´Ï´Ù.
'%s': ÆÄÀ̽㠹®ÀÚ¿Àº <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÏ¿©
À¯´ÏÄÚµå ¹®ÀÚ¿·Î ¹ø¿ªµÈ´Ù.
À¯´ÏÄÚµå °´Ã¼´Â ÀÖ´Â ±×´ë·Î ¹Þ¾ÆµéÀδÙ.
|
´Ù¸¥ ¸ðµç ¹®ÀÚ¿ Çü½Äȼ³Á¤Àڵ鵵 ÀûÀýÇÏ°Ô ÀÛµ¿ÇØ¾ß ÇÕ´Ï´Ù.
Example: u"%s %s" % (u"abc", "abc") == u"abc abc" |
ÀÌ·¯ÇÑ Ç¥½Ä¼³Á¤ÀÚµéÀº PyArg_ParseTuple() API¿¡ ÀÇÇØ¼ »ç¿ëµË´Ï´Ù:
"U": À¯´ÏÄÚµå °´Ã¼Àΰ¡¸¦ Á¡°ËÇÏ°í ±×°Í¿¡ ´ëÇÑ ÂüÁ¶Á¡À» ¹ÝȯÇÑ´Ù
"s": À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿©: ±× °´Ã¼ÀÇ <defenc> ¹öÆÛ¿¡ ´ëÇÑ
ÂüÁ¶Á¡À» ¹ÝȯÇÑ´Ù (À̰ÍÀº <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù).
"s#": ±× À¯´ÏÄÚµå °´Ã¼ÀÇ ±âº» ÄÚµåÀüȯ ¹öÀü¿¡ Á¢±ÙÇÑ´Ù. (¹öÆÛ ÀÎÅÍÆäÀ̽º ÂüÁ¶);
±× ±æÀÌ´Â ±âº» ÄÚµåÀüȯ ¹®ÀÚ¿ÀÇ ±æÀÌ¿¡ °ü·ÃµÈ´Ù.
±× À¯´ÏÄÚµå °´Ã¼ÀÇ ±æÀÌ¿¡ °ü·ÃµÇ´Â °ÍÀÌ ¾Æ´Ï¶ó´Â °ÍÀ» ÁÖ¸ñÇ϶ó.
"t#": ´ÙÀ½°ú µ¿ÀÏ "s#".
"es":
µÎ °³ÀÇ ¸Å°³º¯¼ö¸¦ ÃëÇÑ´Ù: (const char *)°ú buffer (char **).
ÀÔ·Â °´Ã¼´Â ¸ÕÀú Åë»óÀûÀÎ ¹æ½ÄÀ¸·Î À¯´ÏÄÚµå·Î °Á¦ º¯È¯µÈ´Ù
±×¸®°í ³ª¼ ÁÖ¾îÁø ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ¹®ÀÚ¿·Î ÄÚµåÀüȯµÈ´Ù.
Ãâ·Â½Ã¿¡, ÇÊ¿äÇÑ ±æÀÌ ¸¸ÅÀÇ ¹öÆÛ°¡ ÇÒ´çµÇ°í
*buffer¸¦ ÅëÇÏ¿© NULL-Á¾·á ¹®ÀÚ¿·Î ¹ÝȯµÈ´Ù.
ÄÚµåÀüȯµÈ ¹®ÀÚ¿Àº ³»ÀåµÈ NULL ¹®ÀÚµéÀ» Æ÷ÇÔÇÒ ¼ö ¾ø´Ù.
È£ÃâÀÚ´Â PyMem_Free()À» È£ÃâÇØ¼ »ç¿ëÇÏ°í ³ ´ÙÀ½¿¡´Â
ÇÒ´çµÈ *buffer¸¦ Ç®¾îÁ٠åÀÓÀÌ ÀÖ´Ù.
"es#":
¼¼ °³ÀÇ ¸Å°³ º¯¼ö¸¦ ÃëÇÑ´Ù: encoding (const char *),
buffer (char **) ±×¸®°í buffer_len (int *).
ÀÔ·Â °´Ã¼´Â ¸ÕÀú Åë»óÀûÀÎ ¹æ½ÄÀ¸·Î À¯´ÏÄÚµå·Î °Á¦º¯È¯µÈ´Ù
±×¸®°í ³ª¼ ÁÖ¾îÁø ÄÚµåÀüȯÀ¸·Î ¹®ÀÚ¿·Î ÄÚµåÀüȯµÈ´Ù.
*buffer°¡ non-NULLÀ̶ó¸é, *buffer_lenÀº Ãâ·Â½Ã¿¡ ¹Ýµå½Ã
sizeof(buffer)·Î ¼³Á¤µÇ¾î¾ß ÇÑ´Ù.
Ãâ·ÂÀº ±×·¯¸é *buffer¿¡ º¹»çµÈ´Ù.
¸¸¾à *buffer°¡ NULLÀ̸é, ÇÊ¿äÇÑ ±æÀÌÀÇ ¹öÆÛ°¡ ÇÒ´çµÇ°í
Ãâ·ÂÀº ±× ¹öÆÛ¿¡ º¹»çµÈ´Ù. ±×·¯¸é *buffer´Â °»½ÅµÇ¾î
ÇÒ´çµÈ ¸Þ¸ð¸® Áö¿ªÀ» °¡¸£Å²´Ù.
È£ÃâÀÚ´Â »ç¿ë ÈÄ¿¡ PyMem_Free()¸¦ È£ÃâÇÏ¿©
ÇÒ´çµÈ *buffer¸¦ Ç®¾îÁ٠åÀÓÀÌ ÀÖ´Ù.
µÎ °æ¿ì ¸ðµÎ *buffer_len´Â ¾º¿©Áø ¹®ÀÚ¿ÀÇ °³¼ö·Î °»½ÅµÈ´Ù
(À̲ø¸®´Â NULL-¹ÙÀÌÆ®´Â Á¦¿ÜÇÑ´Ù).
Ãâ·Â ¹öÆÛ´Â È®½ÇÇÏ°Ô NULL-Á¾·áÇüÀ̾î¾ß ÇÑ´Ù.
¿¹Á¦:
ÀÚµ¿-ÇÒ´ç°ú ÇÔ²² "es#"¸¦ »ç¿ëÇϱâ:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char *buffer = NULL;
int buffer_len = 0;
if (!PyArg_ParseTuple(args, "es#:test_parser",
encoding, &buffer, &buffer_len))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromStringAndSize(buffer, buffer_len);
PyMem_Free(buffer);
return str;
}
NULL-Á¾·áÇü ¹®ÀÚ¿À» ¹ÝȯÇÏ´Â ÀÚµ¿-ÇÒ´ç°ú ÇÔ²² "es"¸¦ »ç¿ëÇϱâ:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char *buffer = NULL;
if (!PyArg_ParseTuple(args, "es:test_parser",
encoding, &buffer))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromString(buffer);
PyMem_Free(buffer);
return str;
}
¹Ì¸®-ÇÒ´çµÈ ¹öÆÛ¿Í ÇÔ²² "es#"À» »ç¿ëÇϱâ:
static PyObject *
test_parser(PyObject *self,
PyObject *args)
{
PyObject *str;
const char *encoding = "latin-1";
char _buffer[10];
char *buffer = _buffer;
int buffer_len = sizeof(_buffer);
if (!PyArg_ParseTuple(args, "es#:test_parser",
encoding, &buffer, &buffer_len))
return NULL;
if (!buffer) {
PyErr_SetString(PyExc_SystemError,
"buffer is NULL");
return NULL;
}
str = PyString_FromStringAndSize(buffer, buffer_len);
return str;
}
|
file.write(object)¿Í ´ëºÎºÐÀÇ ´Ù¸¥ ½ºÆ®¸² ÀÛ¼º±âµéÀº "s#" ¶Ç´Â "t#" Àμö ÇØ¼® Ç¥½Ä¼³Á¤ÀÚ¸¦ »ç¿ëÇÏ¿© ¾²±âÇÒ µ¥ÀÌŸ¸¦ ¿¶÷ÇϹǷÎ, À¯´ÏÄÚµå °´Ã¼ÀÇ ±âº» ÄÚµåÀüȯµÈ ¹®ÀÚ¿ ¹öÀüÀº ±× ½ºÆ®¸²¿¡ ¾²¿©Áú °ÍÀÔ´Ï´Ù (¹öÆÛ ÀÎÅÍÆäÀ̽º ÂüÁ¶).
À¯´ÏÄڵ带 »ç¿ëÇÏ¿© ¸í½ÃÀûÀ¸·Î ÆÄÀÏÀ» ó¸®Çϱâ À§Çؼ, Ç¥ÁØ ½ºÆ®¸² ÄÚµ¦Àº ±× ÄÚµ¦ ¸ðµâÀ» ÅëÇØ¼ »ç¿ë°¡´ÉÇÑ ±×´ë·Î »ç¿ëµÇ¾î¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ ¸ðµâÀº ´ÜÃà open(filename,mode,encoding) ÇÔ¼ö¸¦ »ç¿ë°¡´ÉÇϵµ·Ï Á¦°øÇÏ¿©¾ß ÇÏ¸ç ¶ÇÇÑ ÇÊ¿äÇÒ ¶§, ¸ðµå(mode)°¡ 'b'¹®ÀÚ¸¦ Æ÷ÇÔÇÏ´Â °ÍÀ» È®ÀÎÇÏ¿©¾ß ÇÕ´Ï´Ù.
»ç¿ëÀÚ¸¸ÀÌ ÀÔ·Â µ¥ÀÌŸ°¡ »ç¿ëÇÏ´Â ÄÚµåÀüȯÀ» ¾Ë ¼ö ÀÖ½À´Ï´Ù, ±×·¡¼ Ưº°ÇÑ ¸¶¹ýÀÌ Àû¿ëµÇÁö ¾Ê½À´Ï´Ù. »ç¿ëÀÚ´Â ¸í½ÃÀûÀ¸·Î ±× ¹®ÀÚ¿ µ¥ÀÌŸ¸¦ À¯´ÏÄÚµå °´Ã¼·Î ÇÊ¿äÇÒ ¶§ ¸¶´Ù º¯È¯ÇØ¾ß Çϰųª ¶Ç´Â ÄÚµ¦ ¸ðµâ¿¡ Á¤ÀÇµÈ ÆÄÀÏ Æ÷ÀåÀÚ¸¦ »ç¿ëÇØ¾ß ÇÒ °ÍÀÔ´Ï´Ù. (ÆÄÀÏ/½ºÆ®¸² Ãâ·ÂÀ» ÂüÁ¶).
¸ðµç ÆÄÀ̽㠹®ÀÚ¿ ¸Þ½îµå, ±×¸®°í ¶Ç:
.encode([encoding=<±âº» ÄÚµåÀüȯ>][,errors="strict"])
--> À¯´ÏÄÚµå Ãâ·ÂÀ» ÂüÁ¶
.splitlines([include_breaks=0])
--> À¯´ÏÄÚµå ¹®ÀÚ¿À» (À¯´ÏÄÚµå)¶óÀÎÀÇ ¸®½ºÆ®·Î ÀÚ¸¥´Ù;
¸¸¾à, include_breaks°¡ ÂüÀ̶ó¸é ¶óÀÎ ºê·¹ÀÌÅ©°¡ Æ÷ÇÔµÈ
±× ¶óÀεéÀ» ¹ÝȯÇÑ´Ù. ¶óÀÎ ºê·¹ÀÌÅ©¸¦ ¼öÇàÇÏ´Â ¹æ¹ý¿¡
´ëÇØ¼´Â ÁÙ ³Ñ±è(Line Breaks)¸¦ ÂüÁ¶Ç϶ó.
|
¿ì¸®´Â ÇÁ·¹µå¸¯ ·éÆ®(Fredrik Lundh)ÀÇ À¯´ÏÄÚµå °´Ã¼ ±¸ÇöÀ» Åä´ë·Î »ç¿ëÇØ¾ß ÇÕ´Ï´Ù. ±×ÀÇ ±¸ÇöÀº ÀÌ¹Ì ÇÊ¿äÇÑ ¹®ÀÚ¿ ¸Þ½îµåÀÇ »ó´ç¼ö¸¦ ±¸ÇöÇϰí ÀÖÀ¸¸ç Àß ÀÛ¼ºµÈ ÄÚµå ±âÃʸ¦ Á¦°øÇÏ¿© ÁÜÀ¸·Î½á ¿ì¸®´Â ±× À§¿¡ ±¸ÃàÇÒ ¼ö ÀÖ½À´Ï´Ù.
ÇÁ·¹µå¸¯ÀÇ ±¸Çö¿¡ ÀÖ´Â sharing °´Ã¼´Â »ý·«µÇ¾ß ÇÕ´Ï´Ù.
Å×½ºÆ® ÄÉÀ̽º´Â Lib/test/test_string.py¿¡ ÀÖ´Â ±ÔÄ¢µéÀ» ÁؼöÇØ¾ß Çϸç ÄÚµ¦ ·¹Áö½ºÆ®¸®¿Í Ç¥ÁØ ÄÚµ¦¿¡ ´ëÇÑ Ãß°¡ÀûÀÎ Á¡°ËÀ» Æ÷ÇÔÇØ¾ß ÇÕ´Ï´Ù.
À¯´ÏÄÚµå Äܼ֯¼¾ö:
http://www.unicode.org/
À¯´ÏÄÚµå Áú¹®°ú ´äº¯(FAQ):
http://www.unicode.org/unicode/faq/
À¯´ÏÄÚµå 3.0:
http://www.unicode.org/unicode/standard/versions/Unicode3.0.html
À¯´ÏÄÚµå-±â¼ú º¸°í¼:
http://www.unicode.org/unicode/reports/techreports.html
À¯´ÏÄÚµå-¦Áþ±â:
http://www.unicode.org/Public/MAPPINGS/
À¯´ÏÄÚµå °³·Ð (¾à°£ ¿À·¡µÇ¾úÁö¸¸ ¿©ÀüÈ÷ ÀÐÀ» ¸¸ÇÔ):
http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
ºñ±³¸¦ À§ÇØ:
Introducing Unicode to ECMAScript (aka JavaScript) --
http://www-4.ibm.com/software/developer/library/internationalization-support.html
IANA ¹®ÀÚ ¼¼Æ® À̸§µé:
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
Æ÷½Ä½º¿Í ¸®´ª½º¿¡ ´ëÇÑ À¯´ÏÄÚµå¿Í UTF-8À» Áö¿øÇÏ´Â ¹®Á¦¿¡ ´ëÇÑ ³íÀÇ:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
ÄÚµåÀüȯ:
°³°ü:
http://czyborra.com/utf/
UTC-2:
http://www.uazone.com/multiling/unicode/ucs2.html
UTF-7:
RFC2152¿¡ Á¤ÀǵÊ,
http://www.uazone.com/multiling/ml-docs/rfc2152.txt
UTF-8:
RFC2279¿¡ Á¤ÀǵÊ,.
http://info.internet.isi.edu/in-notes/rfc/files/rfc2279.txt
UTF-16:
http://www.uazone.com/multiling/unicode/wg2n1035.html
ÀÌ Á¦¾È¼ÀÇ º¯°æ±â·Ï:
-------------------------
1.8: Fixed some URLs to the unicode.org site.
1.7: Added note about the changed behaviour of "s#".
1.6: Changed <defencstr> to <defenc> since this is the name used in the
implementation. Added notes about the usage of <defenc> in the
buffer protocol implementation.
1.5: Added notes about setting the <±âº» ÄÚµåÀüȯ>. Fixed some
typos (thanks to Andrew Kuchling). Changed <defencstr> to <utf8str>.
1.4: Added note about mixed type comparisons and contains tests.
Changed treating of Unicode objects in format strings (if used
with '%s' % u they will now cause the format string to be
coerced to Unicode, thus producing a Unicode object on return).
Added link to IANA charset names (thanks to Lars Marius Garshol).
Added new codec methods .readline(), .readlines() and .writelines().
1.3: Added new "es" and "es#" parser markers
1.2: Removed POD about codecs.open()
1.1: Added note about comparisons and hash values. Added note about
case mapping algorithms. Changed stream codecs .read() and
.write() method to match the standard file-like object methods
(bytes consumed information is no longer returned by the methods)
1.0: changed encode Codec method to be symmetric to the decode method
(they both return (object, data consumed) now and thus become
interchangeable); removed __init__ method of Codec class (the
methods are stateless) and moved the errors argument down to the
methods; made the Codec design more generic w/r to type of input
and output objects; changed StreamWriter.flush to StreamWriter.reset
in order to avoid overriding the stream's .flush() method;
renamed .breaklines() to .splitlines(); renamed the module unicodec
to codecs; modified the File I/O section to refer to the stream codecs.
0.9: changed errors keyword argument definition; added 'replace' error
handling; changed the codec APIs to accept buffer like objects on
input; some minor typo fixes; added Whitespace section and
included references for Unicode characters that have the whitespace
and the line break characteristic; added note that search functions
can expect lower-case encoding names; dropped slicing and offsets
in the codec APIs
0.8: added encodings package and raw unicode escape encoding; untabified
the proposal; added notes on Unicode format strings; added
.breaklines() method
0.7: added a whole new set of codec APIs; added a different encoder
lookup scheme; fixed some names
0.6: changed "s#" to "t#"; changed <defencbuf> to <defencstr> holding
a real Python string object; changed Buffer Interface to delegate
requests to <defencstr>'s buffer interface; removed the explicit
reference to the unicodec.codecs dictionary (the module can implement
this in way fit for the purpose); removed the settable default
encoding; move UnicodeError from unicodec to exceptions; "s#"
not returns the internal data; passed the UCS-2/UTF-16 checking
from the Unicode constructor to the Codecs
0.5: moved sys.bom to unicodec.BOM; added sections on case mapping,
private use encodings and Unicode character properties
0.4: added Codec interface, notes on %-formatting, changed some encoding
details, added comments on stream wrappers, fixed some discussion
points (most important: Internal Format), clarified the
'unicode-escape' encoding, added encoding references
0.3: added references, comments on codec modules, the internal format,
bf_getcharbuffer and the RE engine; added 'unicode-escape' encoding
proposed by Tim Peters and fixed repr(u) accordingly
0.2: integrated Guido's suggestions, added stream codecs and file
wrapping
0.1: first version