ÀÌ Á¦¾ÈÀÇ ÇöÀç ¹öÀüÀº À¯´ÏÄÚµå-ÆÄÀ̽ã ÅëÇÕÀÇ ¸¹Àº ¾ç»óµé·Î ÀÎÇÏ¿© ¾à°£ Á¤¸®µÇÁö ¾Ê¾ÒÀ½À» ÁÖÀÇÇϼ¼¿ä.
ÀÌ ¹®¼ÀÇ ÃֽŠ¹öÀüÀº Ç×»ó ´ÙÀ½¿¡¼ º¼ ¼ö ÀÖ½À´Ï´Ù:
http://starship.python.net/~lemburg/unicode-proposal.txt
ÀÌÀü ¹öÀüÀº ´ÙÀ½¿¡¼ ¾òÀ» ¼ö ÀÖ½À´Ï´Ù:
http://starship.python.net/~lemburg/unicode-proposal-X.X.txt
À¯´ÏÄÚµå ±¸ÇöÀº Àڽſ¡°Ô °Ç³×Áö´Â 8-ºñÆ® ¹®ÀÚ¿ÀÇ ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© °Á¦ Çüº¯È¯À» ÇØ¾ß ÇÑ´Ù°í °¡Á¤ÇÏ°í ÀÖÀ¸¸ç, ¾Æ¹«·± ƯÁ¤ÇÑ ÄÚµåÀüȯÀÌ ÁÖ¾îÁöÁö ¾ÊÀ¸¸é ±âº»À¸·Î ÁöÁ¤µÈ ÄÚµåÀüȯÀ¸·Î ¹®ÀÚ¿À» À¯´ÏÄÚµå º¯È¯ÇØ¾ß ÇÑ´Ù°í °¡Á¤ÇÏ°í ÀÖ½À´Ï´Ù. ÀÌ ÄÚµåÀüȯÀ» ÀÌ ÅؽºÆ®¿¡¼´Â ±âº» ÄÚµåÀüȯ(default encoding)À̶ó°í ºÎ¸¨´Ï´Ù.
ÀÌ ¶§¹®¿¡, À¯´ÏÄÚµåÀÇ ±¸ÇöÀº ÇϳªÀÇ Àü¿ª º¯¼ö¸¦ À¯ÁöÇϴµ¥ site.py ÆÄÀ̽㠽ÃÀÛ ½ºÅ©¸³Æ®¿¡ ¼³Á¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌÈÄÀÇ º¯°æÀº ºÒ°¡ÇÕ´Ï´Ù. ±âº» ÄÚµåÀüȯÀº µÎ°³ÀÇ sys ¸ðµâ API·Î ¼³Á¤µÇ°í ¿¶÷µÉ ¼ö ÀÖ½À´Ï´Ù:
±×·¸Áö ¾Ê°í Á¤ÀǵÇÁö ¾Ê¾Ò°Å³ª, ¼³Á¤µÇÁö ¾Ê¾Ò´Ù¸é, ±âº» ÄÚµåÀüȯÀº 'ascii'°¡ ±âº»°ªÀÌ µË´Ï´Ù. ÀÌ ÄÚµåÀüȯÀº ÆÄÀ̽ãÀÇ ½ÃÀÛ½ÃÀÇ ±âº»°ªÀ̱⵵ ÇÕ´Ï´Ù. (±×¸®°í site.py °¡ ½ÇÇàµÇ±â Àü¿¡ È¿·ÂÀ» ¹ßÈÖÇÕ´Ï´Ù).
ÁÖ¸ñÇÒ °ÍÀº ±âº»¼³Á¤µÈ site.py ½ÃÀÛ ¸ðµâ¿¡´Â ¼³Á¤Ãë¼Ò °¡´ÉÇÑ ¼±ÅÃÀûÀÎ Äڵ尡 Æ÷ÇԵǾî Àִµ¥ ÀÌ ÄÚµå´Â ÇöÀç ·ÎÄÉÀÏ¿¡ Á¤ÀÇµÈ ÄÚµåÀüȯÀ» µû¶ó ±âº» ÄÚµåÀüȯÀ» ¼³Á¤ÇÒ ¼ö ÀÖ½À´Ï´Ù. locale ¸ðµâÀº ¿î¿µÃ¼Á¦ ȯ°æÀÌ Á¤ÀÇÇÑ ±âº» ·ÎÄÉÀÏ ¼³Á¤À¸·ÎºÎÅÍ ÄÚµåÀüȯÀ» ÃßÃâÇϴµ¥ »ç¿ëµË´Ï´Ù (locale.py ÂüÁ¶). ¸¸¾à ÄÚµåÀüȯÀÌ °áÁ¤µÉ ¼ö ¾ø´Ù¸é, ¾Ë¼ö ¾ø°Å³ª, Áö¿øµÇÁö ¾Ê´Â´Ù¸é, ±× ÄÚµå´Â ±âº» ÄÚµåÀüȯÀ» 'ascii'·Î ±âº»¼³Á¤ÇÕ´Ï´Ù. ÀÌ Äڵ带 °¡´ÉÇÏ°Ô ÇÏ·Á¸é, site.py ÆÄÀÏÀ» ÆíÁýÇϰųª ¶Ç´Â ÀûÀýÇÑ Äڵ带 ÆÄÀ̽ãÀÇ ¼³Ä¡º»¿¡ ÀÖ´Â sitecustomize.py ¸ðµâ¾ÈÀ¸·Î Áý¾î ³ÖÀ¸¼¼¿ä.
ÆÄÀ̽ãÀº __builtins__À» ÅëÇÏ¿© »ç¿ë°¡´ÉÇÑ À¯´ÏÄÚµå ¹®ÀÚ¿µéÀ» À§Çؼ ´ÙÀ½°ú °°Àº ³»Àå ±¸¼ºÀÚ¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù:
u = unicode(encoded_string[,encoding=<default encoding>][,errors="strict"]) u = u'<unicode-escape·Î ÄÚµåÀüȯµÈ ÆÄÀ̽㠹®ÀÚ¿>' u = ur'<raw-unicode-escape·Î ÄÚµåÀüȯµÈ ÆÄÀ̽㠹®ÀÚ¿>' |
'À¯´ÏÄÚµå-Å»Ãâ(unicode-escape)' ÄÚµåÀüȯÀº ´ÙÀ½°ú °°ÀÌ Á¤Àǵ˴ϴÙ:
¿¡·¯¿¡ ´ëÇØ °¡´ÉÇÑ °ªµé¿¡ ´ëÇÑ ¼³¸íÀ» º¸·Á¸é ¾Æ·¡ÀÇ ÄÚµ¦ ¼½¼ÇÀ» º¸¼¼¿ä.
¿¹Á¦: u'abc' -> U+0061 U+0062 U+0063 u'\u1234' -> U+1234 u'abc\u1234\n' -> U+0061 U+0062 U+0063 U+1234 U+005c |
'raw-unicode-escape' ÄÚµåÀüȯÀº ¾Æ·¡¿Í °°ÀÌ Á¤Àǵ˴ϴÙ:
ÁÖ¸ñÇÒ °ÍÀº ÇÁ·Î±×·¥À» ÀÛ¼ºÇÒ ¶§ »ç¿ëÇÑ ±× ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ¾à°£ÀÇ ÈùÆ®¸¦ Á¦°øÇØ¾ß ÇÑ´Ù´Â °ÍÀÔ´Ï´Ù. ¼Ò½º ÆÄÀÏÀÇ Ã¹ ¸î ÁÙÀÇ ÁÖ¼®¿¡ ÇÁ·Î±×·¥ÀÇ ÇÑ ºÎºÐÀ¸·Î¼ Á¦°øÇÏ´Â °ÍÀÔ´Ï´Ù (¿¹¸¦ µé¾î, '# source file encoding: latin-1'). ¸¸¾à 7-ºñÆ® ¾Æ½ºÅ°¸¸À» »ç¿ëÇÑ´Ù¸é ¸ðµç ÀÏÀº ¹®Á¦°¡ ¾øÀ¸¸ç ±×·¯ÇÑ ÁÖÀÇ´Â ÇÊ¿ä¾ø½À´Ï´Ù, ±×·¯³ª ¸¸¾à ¾Æ½ºÅ°¿¡ Á¤ÀǵǾî ÀÖÁö ¾ÊÀº Latin-1 ¹®ÀÚµéÀ» Æ÷ÇÔÇÑ´Ù¸é, ÈùÆ®¸¦ Æ÷ÇÔÇÏ´Â °ÍÀÌ ºÐ¸íÈ÷ °¡Ä¡°¡ ÀÖ½À´Ï´Ù. ´Ù¸¥ ³ª¶óÀÇ »ç¶÷µéµµ ¿ª½Ã ¿©·¯ºÐÀÇ ¼Ò½º¿¡ ÀÖ´Â ¹®ÀÚ¿À» ÀÐÀ» ¼ö Àֱ⸦ ¹Ù¶ó±â ¶§¹®ÀÌÁö¿ä.
À¯´ÏÄÚµå °´Ã¼´Â ¸Þ½îµå .encode([encoding=<±âº» ÄÚµåÀüȯ>])¸¦ °¡Áö´Âµ¥ ÆÄÀ̽㠹®ÀÚ¿ ÄÚµåÀüȯÀ» ÁÖ¾îÁø Àü·«À» »ç¿ëÇÏ¿© À¯´ÏÄÚµå ¹®ÀÚ¿·Î ¹ÝȯÇÕ´Ï´Ù. (ÄÚµ¦À» ÂüÁ¶).
print u := print u.encode() # <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù; str(u) := u.encode() # <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù; repr(u) := "u%s" % repr(u.encode('unicode-escape')) |
¶ÇÇÑ, C·Î ÀÛ¼ºµÈ ´Ù¸¥ APIµéÀÌ À¯´ÏÄÚµå °´Ã¼µéÀ» ¾î¶»°Ô ´Ù·ê °ÍÀÎÁö¿¡ ´ëÇØ ´õ ÀÚ¼¼ÇÑ »çÇ×Àº ³»ºÎ Àμö Çؼ®°ú ¹öÆÛ ÀÎÅÍÆäÀ̽º(Internal Argument Parsing and Buffer Interface)¸¦ ÂüÁ¶Çϼ¼¿ä.
À¯´ÏÄÚµå 3.0Àº 32-ºñÆ® ¼¼ö ¹®ÀÚ ¼¼Æ®¸¦ °¡Áö¹Ç·Î, ±¸ÇöÀº 32-ºñÆ®¸¦ ÀνÄÇÏ´Â ¼¼ö º¯È¯ API¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù:
ord(u[:1]) (ÀÌ°ÍÀº È®ÀåµÇ¾î À¯´ÏÄÚµå °´Ã¼¿Í ÀÛµ¿Çϴ ǥÁØ ord() ÇÔ¼öÀÌ´Ù) --> À¯´ÏÄÚµå ¼ø¼ ¹øÈ£ (32-ºñÆ®) unichr(i) --> ¹®ÀÚ i¿¡ ´ëÇÑ À¯´ÏÄÚµå °´Ã¼ (32-ºñÆ®·Î ÁÖ¾îÁø´Ù¸é); ±×·¸Áö ¾ÊÀ¸¸é ValueError |
µÎ API´Â ¹®ÀÚ¿¿¡ ´ëÇÑ »ó´ëÀû ÇÔ¼ö ord()¿Í chr() ÇÔ¼öó·³ __builtins__·Î µé¾î°¡¾ß ÇÕ´Ï´Ù.
À¯´ÏÄÚµå´Â °³ÀÎÀûÀÎ ÄÚµåÀüȯÀ» À§ÇÑ °ø°£À» Á¦°øÇÕ´Ï´Ù. ÀÌ·¯ÇÑ °ÍµéÀ» »ç¿ëÇÏ¸é ¸Ó½Å¿¡ µû¶ó ¼·Î ´Ù¸¥ Ãâ·Â Ç¥ÇöÀ» ¾ß±âÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌ ¹®Á¦´Â ÆÄÀ̽ã ȤÀº À¯´ÏÄÚµåÀÇ ¹®Á¦°¡ ¾Æ´Ï¶ó ¸Ó½Å ¼³Á¤°ú À¯Áöº¸¼öÀÇ ¹®Á¦ÀÔ´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼´Â ±×µéÀÇ ¾Æ½ºÅ° ¹®ÀÚ¿°ú µ¿µîÇÑ Çؽ¬ °ªÀ» ¹ÝȯÇØ¾ß ÇÕ´Ï´Ù. À¯´ÏÄÚµå ¹®ÀÚ¿ÀÌ ºñ-¾Æ½ºÅ° °ªÀ» °¡Áö°í ÀÖÀ¸¸é ±âº» ÄÚµåÀüȯµÈ, µ¿µîÇÑ ¹®ÀÚ¿ Ç¥Çö°ú, °°Àº Çؽ¬ °ªÀ» ¹ÝȯÇÑ´Ù°í º¸ÁõÇÒ ¼ö ¾ø½À´Ï´Ù.
cmp() (ȤÀº PyObject_Compare())¸¦ »ç¿ëÇÏ¿© ºñ±³µÉ ¶§, ±× ±¸ÇöÀº ±× º¯È¯Áß¿¡ ÀϾ´Â TypeError¸¦ °¨Ãß¾î¼ ±× ¹®ÀÚ¿ÀÇ ÇàÀ§¿Í ¶È °°ÀÌ À¯ÁöµÇ¾î¾ß ÇÕ´Ï´Ù. ¹®ÀÚ¿À» À¯´ÏÄÚµå·Î °Á¦ º¯È¯ÇÏ´Â Áß¿¡ ÀϾ´Â ValueErrors¿Í °°Àº ´Ù¸¥ ¸ðµç ¿¡·¯µéÀº °¨Ãß¾îÁ®¼´Â ¾ÈµÇ°í »ç¿ëÀÚ¿¡°Ô °Ç³×Á®¾ß ÇÕ´Ï´Ù.
Æ÷ÇÔ Å×½ºÆ®¿¡¼ ('a' in u'abc' ±×¸®°í u'a' in 'abc') ¾çÂÊ Æí ¸ðµÎ ÀÌ Å×½ºÆ®¸¦ Àû¿ëÇϱâ Àü¿¡ °Á¦·Î À¯´ÏÄÚµå·Î º¯È¯µÇ¾î¾ß ÇÕ´Ï´Ù. °Á¦ º¯È¯µµÁß¿¡ ¹ß»ýÇÏ´Â ¿¡·¯(¿¹. None in u'abc')µéÀº °¨Ãß¾îÁö¸é ¾ÈµË´Ï´Ù.
u + s := u + unicode(s) s + u := unicode(s) + u |
°ü·ÃµÈ ¸ðµç ¹®ÀÚ¿À» À¯´ÏÄÚµå·Î º¯È¯ÇÏ°í ±× ÀμöµéÀ» °°Àº À̸§À» °¡Áø À¯´ÏÄÚµå ¸Þ½îµå¿¡ Àû¿ëÇÔÀ¸·Î½á, ¸ðµç ¹®ÀÚ¿ ¸Þ½îµåµéÀº È£ÃâÀ» µ¿µîÇÑ À¯´ÏÄÚµå °´Ã¼ ¸Þ½îµå È£Ãâ¿¡ À§ÀÓÇÏ¿©¾ß ÇÕ´Ï´Ù
¿¹¸¦ µé¾î, string.join((s,u),sep) := (s + sep) + u sep.join((s,u)) := (s + sep) + u |
À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÑ %-Çü½ÄÈ Àаí/¾²±â´Â, Çü½ÄÈ Ç¥½Ä¼³Á¤ÀÚ(Formatting Markers)¸¦ ÂüÁ¶Çϼ¼¿ä.
UnicodeError´Â ¿¹¿Ü¸ðµâ¿¡ ValueErrorÀÇ ÇϺÎŬ·¡½º·Î Á¤ÀǵǾî ÀÖ½À´Ï´Ù. ±×°ÍÀº PyExc_UnicodeError¸¦ ÅëÇÏ¿© C ¼öÁØ¿¡¼ »ç¿ë°¡´ÉÇÕ´Ï´Ù. À¯´ÏÄÚµå ÄÚµåÀüȯ/ÄÚµåÇؼ®¿¡ °ü·ÃµÈ ¸ðµç ¿¹¿ÜµéÀº UnicodeErrorÀÇ ÇϺÎŬ·¡½º¿©¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ (ÄÚµ¦ ÀÎÅÍÆäÀ̽º Á¤ÀÇ ÂüÁ¶) Ž»ö ÀúÀå¼Ò´Â "codecs"¸ðµâ¿¡ ÀÇÇؼ ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù:
codecs.register(search_function)
Ž»ö ÇÔ¼ö´Â ÇϳªÀÇ Àμö¸¦ ¿¹»óÇÕ´Ï´Ù, ¸ðµÎ ¼Ò¹®ÀÚ·ÎµÈ ÄÚµåÀüȯ À̸§À» ÃëÇϴµ¥ ÇÏÀÌÇ°ú °ø¹éÀº ¹ØÁÙ¹®ÀÚ·Î º¯È¯µÇ°í, ´ÙÀ½°ú °°Àº ÀμöµéÀ» ÃëÇÏ´Â ÇÔ¼öµé(encoder, decoder, stream_reader, stream_writer)À» ´ãÀº ÅÍÇÃÀ» ¹ÝȯÇÕ´Ï´Ù:
|
Ž»ö ÇÔ¼ö°¡ ÁÖ¾îÁø ÄÚµåÀüȯÀ» ¹ß°ßÇÒ ¼ö ¾øÀ» °æ¿ì¿¡´Â, NoneÀ» ¹ÝȯÇØ¾ß ÇÕ´Ï´Ù.
ÄÚµåÀüȯ¿¡ ´ëÇÑ º°¸í Áö¿øÀÌ ±× Ž»ö(search) ÇÔ¼ö¿¡ ³²°ÜÁ® ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ ¸ðµâÀº ¼öÇà¼ÓµµÀÇ ÀÌÀ¯ ¶§¹®¿¡ ÄÚµåÀüȯÀÇ ÀÓ½ÃÀúÀåÀ» À¯ÁöÇÒ °ÍÀÔ´Ï´Ù. ÄÚµåÀüȯÀº ¸ÕÀú ÀÓ½ÃÀúÀåÀå¼Ò¿¡¼ ÂüÁ¶µË´Ï´Ù. ¹ß°ßµÇÁö ¾ÊÀ¸¸é, µî·ÏµÈ Ž»ö ÇÔ¼öµéÀÇ ¸®½ºÆ®°¡ °Ë»öµË´Ï´Ù. ¾Æ¹«·± ÄÚµ¦ ÅÍÇõµ ¹ß°ßµÇÁö ¾ÊÀ¸¸é, LookupError°¡ ÀϾ´Ï´Ù. ±×·¸Áö ¾ÊÀ¸¸é, ±× ÄÚµ¦ ÅÍÇÃÀº ±× Àӽñâ¾ïÀå¼Ò¿¡ ÀúÀåµÇ°í È£ÃâÀÚ¿¡°Ô ¹ÝȯµË´Ï´Ù.
Codec ½Çü¸¦ ¿¶÷ÇÏ·Á¸é ´ÙÀ½ÀÇ API°¡ »ç¿ëµÇ¾ß ÇÕ´Ï´Ù:
codecs.lookup(encoding)
ÀÌ°ÍÀº ¹ß°ßµÈ ÄÚµ¦ ÅÍÇÃÀ» ¹ÝȯÇÏ´ø°¡ LookupError¸¦ ÀÏÀ¸Åµ´Ï´Ù.
Ç¥ÁØ ÄÚµ¦Àº Ç¥ÁØ ÆÄÀ̽ã ÄÚµå ¶óÀ̺귯¸®¿¡ ÀÖ´Â encodings/ ÆÐÅ°Áö µð·ºÅ丮 ¾È¿¡ ÀÖ¾î¾ß ÇÕ´Ï´Ù. ±× µð·ºÅ丮¿¡ ÀÖ´Â __init__.py ÆÄÀÏÀº ÄÚµ¦ ÂüÁ¶ ȣȯ Ž»ö ÇÔ¼ö¸¦ Æ÷ÇÔÇÏ¿©¾ß¸¸ Çϴµ¥ ÀÌ´Â ÄÚµ¦ ÂüÁ¶¿¡ ±âÃÊÇÏ¿© ´À¸° ¸ðµâÀ» ±¸ÇöÇÕ´Ï´Ù.
ÆÄÀ̽ãÀº °¡Àå ÀûÀýÇÑ ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ´Ù¼öÀÇ Ç¥ÁØ ÄÚµ¦À» Á¦°øÇÏ¿©¾ß ÇÕ´Ï´Ù, ¿¹¸¦ µé¾î,
'utf-8': 8-ºñÆ® °¡º¯ ±æÀÌ ÄÚµåÀüȯ 'utf-16': 16-ºñÆ® °¡º¯ ±æÀÌ ÄÚµåÀüȯ (ÀÛÀº°ª/Å«°ª Á¾·áÇü) 'utf-16-le': utf-16ÀÌÁö¸¸ ¸í½ÃÀûÀ¸·Î ÀÛÀº°ª Á¾·áÇü 'utf-16-be': utf-16ÀÌÁö¸¸ ¸í½ÃÀûÀ¸·Î Å«°ª Á¾·áÇü 'ascii': 7-ºñÆ® ¾Æ½ºÅ° ÄÚµåÆäÀÌÁö 'iso-8859-1': ISO 8859-1 (Latin 1) ÄÚµåÆäÀÌÁö 'unicode-escape': Á¤ÀÇ´Â À¯´ÏÄÚµå ±¸¼ºÀÚ¸¦ º¸¼¼¿ä 'raw-unicode-escape': Á¤ÀÇ´Â À¯´ÏÄÚµå ±¸¼ºÀÚ¸¦ º¸¼¼¿ä 'native': ÆÄÀ̽ãÀÌ »ç¿ëÇÏ´Â ³»ºÎ Çü½ÄÀ» ´ýÇÁÇÑ´Ù |
ÀϹÝÀûÀÎ º°¸íµµ ±âº»°ª¸¶´Ù Á¦°øµÇ¾î¾ß ÇÕ´Ï´Ù, ¿¹¸¦ µé¾î, 'iso-8859-1'¿¡ ´ëÇؼ´Â 'latin-1'ÀÌ Á¦°øµÇ¾î¾ß ÇÔ.
ÁÖÀÇ: 'utf-16'Àº ÆÄÀÏ ÀÔ/Ãâ·Â¿¡ ´ëÇؼ´Â ¹ÙÀÌÆ® ¼ø¼ Ç¥½Ä¼³Á¤(BOM)À» ÇʼöÀûÀ¸·Î »ç¿ëÇÏ¿© ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù.
¾Æ½Ã¾Æ ½ºÅ©¸³Æ®¸¦ Áö¿øÇÏ´Â CJK ÄÚµå Àüȯ °°Àº ´Ù¸¥ ¸ðµç ÄÚµå ÀüȯÀº ÇÙ½É ÆÄÀ̽㠹èÆ÷º»¿¡ Æ÷ÇÔµÇÁö ¾Ê´Â º°°³ÀÇ ÆÐÅ°Áö·Î ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù. ±×¸®°í ÀÌ°ÍÀº ÀÌ Á¦¾È¼¿¡¼ ´Ù·ç´Â ºÎºÐÀÌ ¾Æ´Õ´Ï´Ù.
´ÙÀ½ÀÇ ±âº» Ŭ·¡½º°¡ "codecs"¸ðµâ ¿¡¼ Á¤ÀǵǾî¾ß ÇÕ´Ï´Ù. ÄÚµåÀüȯ ¸ðµâ ±¸ÇöÀÚ°¡ »ç¿ëÇÒ Àӽà Àå¼Ò¸¦ Á¦°øÇÒ »Ó¸¸ ¾Æ´Ï¶ó, ¶ÇÇÑ ±× À¯´ÏÄÚµå ±¸ÇöÀÌ ¿¹»óÇÏ°í ÀÖ´Â ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÕ´Ï´Ù.
ÁÖ¸ñÇÒ °ÍÀº ¿©±â¿¡¼ Á¤ÀÇµÈ ÄÚµ¦ ÀÎÅÍÆäÀ̽º´Â ±¤¹üÀ§ÇÑ ¾îÇø®ÄÉÀ̼ǿ¡ ¾ÆÁÖ Àß ¸Â½À´Ï´Ù. À¯´ÏÄÚµå ±¸ÇöÀº, .encode()¿Í .write()¿¡ ´ëÇؼ´Â À¯´ÏÄÚµå °´Ã¼°¡ ÀÔ·ÂµÉ °ÍÀ¸·Î °£ÁÖÇÏ°í .decode()¿¡ ´ëÇؼ´Â ¹®ÀÚ ¹öÆÛ È£È¯ °´Ã¼°¡ ÀÔ·ÂµÉ °ÍÀ¸·Î ¿¹»óÇÕ´Ï´Ù. ÇÔ¼ö .encode()¿Í .read()ÀÇ Ãâ·ÂÀº ÆÄÀ̽㠹®ÀÚ¿ÀÌ µÇ¾î¾ß ÇÕ´Ï´Ù. ±×¸®°í .decode()´Â ¹Ýµå½Ã À¯´ÏÄÚµå °´Ã¼¸¦ ¹ÝȯÇØ¾ß ÇÕ´Ï´Ù.
¸ÕÀú, ¿ì¸®´Â »óÅÂÁ¤º¸¾ø´Â ÄÚµåÀüȯ±â/ÄÚµåÇؼ®±â°¡ ÀÖ½À´Ï´Ù. ÀÌ°ÍÀº (¾Æ·¡¿¡ ÀÖ´Â) ½ºÆ®¸² ÄÚµ¦Ã³·³ ÇÑ ¹ø¿¡ ÀÏÀ» ó¸®ÇÏÁö´Â ¾Ê½À´Ï´Ù, ¿Ö³ÄÇÏ¸é ¸ðµç ±¸¼º¿ä¼ÒµéÀÌ ¸Þ¸ð¸®¿¡¼¸¸ »ç¿ë°¡´ÉÇÏ´Ù°í ¿¹»óµÇ±â ¶§¹®ÀÔ´Ï´Ù.
class Codec: """ »óÅÂÁ¤º¸ ¾ø´Â ÄÚµåÀüȯ±â/ÄÚµåÇؼ®±â¿¡ ´ëÇÑ ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÑ´Ù. .encode()/.decode() ¸Þ½îµå´Â ¿¡·¯ ÀμöµéÀ» Á¦°øÇÔÀ¸·Î½á ´Ù¸¥ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇÒ ¼öµµ ÀÖ´Ù ÀÌ·¯ÇÑ ¹®ÀÚ¿ °ªµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ: 'strict' - ¿¡·¯¸¦ ÀÏÀ¸Å²´Ù (or a subclass) 'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í ´ÙÀ½ ¹®ÀÚ¸¦ °è¼ÓÇÏ¿© ó¸®ÇÑ´Ù 'replace' - Àû´çÇÑ ´ëü ¹®ÀÚ·Î ¹Ù²Û´Ù; ÆÄÀ̽ãÀº ³»Àå À¯´ÏÄÚµå ÄÚµ¦¿¡ ´ëÇÏ¿© °ø½ÄÀûÀÎ U+FFFD ´ëü ¹®ÀÚ(REPLACEMENT CHARACTER)¸¦ »ç¿ëÇÒ °ÍÀÌ´Ù. """ def encode(self,input,errors='strict'): """ ±× °´Ã¼ÀÇ ÀÔ·ÂÀ» ÄÚµå ÀüȯÇÏ°í ÅÍÇÃ(Ãâ·Â °´Ã¼, ó¸®µÈ ±æÀÌ)À» ¹ÝȯÇÑ´Ù . errors´Â Àû¿ëÇؾßÇÒ ¿¡·¯ 󸮸¦ Á¤ÀÇÇÑ´Ù. ±âº» °ªÀº 'strict' ó¸®ÀÌ´Ù. ÀÌ ¸Þ½îµå´Â »óÅÂÁ¤º¸¸¦ ÄÚµ¦ ½Çü¿¡ ÀúÀåÇÏÁö ¾ÊÀ» ¼öµµ ÀÖ´Ù. È¿À²ÀûÀ¸·Î ÄÚµåÀüȯ/ÄÚµåÇؼ®À» Çϱâ À§Çؼ´Â »óÅÂÁ¤º¸¸¦ ÀúÀåÇؾ߸¸ ÇÏ´Â ÄÚµ¦À¸·Î StreamCodecÀ» »ç¿ëÇ϶ó. """ ... def decode(self,input,errors='strict'): """ °´Ã¼ÀÇ ÀÔ·ÂÀ» ÄÚµåÇؼ®ÇÏ°í ÅÍÇÃÀ» ¹ÝȯÇÑ´Ù (Ãâ·Â °´Ã¼, ó¸®µÈ ±æÀÌ). inputÀº ¹Ýµå½Ã bf_getreadbuf ¹öÆÛ ½½·ÔÀ» Á¦°øÇÏ´Â °´Ã¼¶ó¾ß ÇÑ´Ù ÀÌ ½½·ÔÀ» Á¦°øÇÏ´Â °´Ã¼ÀÇ ¿¹µéÀº ÆÄÀ̽㠹®ÀÚ¿, ¹öÆÛ °´Ã¼ ±×¸®°í ¸Þ¸ð¸® ¦Áþ±âµÈ ÆÄÀϵéÀÌ´Ù. errors´Â Àû¿ëÇÒ ¿¡·¯ 󸮸¦ Á¤ÀÇÇÑ´Ù. ±âº»°ªÀ¸·Î 'strict' ó¸®ÀÌ´Ù. ÀÌ ¸Þ½îµå´Â ÄÚµ¦ ½Çü¿¡ »óŸ¦ ÀúÀåÇÏÁö ¾ÊÀ» ¼öµµ ÀÖ´Ù. ÄÚµåÇؼ®/ÄÚµåÀüȯÀ» È¿À²ÀûÀ¸·Î ÇÏ·Á¸é »óŸ¦ ¹Ýµå½Ã À¯ÁöÇØ¾ß ÇÏ´Â ÄÚµ¦À¸·Î StreamCodecÀ» »ç¿ëÇ϶ó. """ ... |
½ºÆ®¸²Ãâ·Â±â(StreamWriter)¿Í ½ºÆ®¸²Ãâ·Â±â(StreamReader)´Â »óÅÂÁ¤º¸ ÀÖ´Â ½ºÆ®¸²¿¡ ´ëÇÏ¿© ÀÛµ¿ÇÏ´Â ÄÚµåÀüȯ±â/ÄÚµåÇؼ®±â(encoders/decoders)¿¡ ´ëÇÏ¿© ÀÎÅÍÆäÀ̽º¸¦ Á¤ÀÇÇÕ´Ï´Ù. ÀÌ·¸°Ô ÇÏ¸é ±× µ¥ÀÌŸ¸¦ Çѹø¿¡ ó¸®ÇÏ¿© È¿À²ÀûÀ¸·Î ¸Þ¸ð¸®¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ¸¸¾à ¾öû Å« ¹®ÀÚ¿ÀÌ ¸Þ¸ð¸®¿¡ ÀÖ´Ù¸é, ±×°ÍµéÀ» cStringIO °´Ã¼·Î Æ÷ÀåÇÏ°í ½ÍÀ¸½Ç ÅÙµ¥ ±×·¯¸é ÀÌ ÄÚµ¦À» »ç¿ëÇϼ¼¿ä. ÀÏ°ý 󸮵µ ÇÒ ¼ö ÀÖÀ» »Ó¸¸ ¾Æ´Ï¶ó, ¿¹¸¦ µé¾î, »ç¿ëÀÚ¿¡°Ô ó¸® Á¤º¸µµ Á¦°øÇÒ ¼ö ÀÖ½À´Ï´Ù.
class StreamWriter(Codec): def __init__(self,stream,errors='strict'): """ StreamWriter ½Çü¸¦ »ý¼ºÇÑ´Ù. ½ºÆ®¸²Àº ÆÄÀÏ-ºñ½ÁÇÑ °´Ã¼·Î¼ (ÀÌÁø) µ¥ÀÌŸ¸¦ ¾²±â À§ÇÏ¿© ¿·ÁÁø´Ù. StreamWrite´Â ¿¡·¯ Å°¿öµå Àμö¸¦ Á¦°øÇÔÀ¸·Î½á ´Ù¸¥ ¹æ½ÄÀÇ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇÒ ¼ö ÀÖ´Ù. ÀÌ·¯ÇÑ ¸Å°³º¯¼öµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ: 'strict' - ValueError (¶Ç´Â ÇϺÎŬ·¡½º)¸¦ ÀÏÀ¸Å²´Ù 'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í °è¼ÓÇÏ¿© ´ÙÀ½ ¹®ÀÚ¸¦ ó¸®ÇÑ´Ù 'replace'- ÀûÀýÇÑ ´ëü ¹®ÀÚ·Î ¹Ù²Û´Ù """ self.stream = stream self.errors = errors def write(self,object): """ °´Ã¼ÀÇ ³»¿ëÀ» ÄÚµåÀüȯÇÏ¿© self.stream¿¡ ¾´´Ù. """ data, consumed = self.encode(object,self.errors) self.stream.write(data) def writelines(self, list): """ ¹®ÀÚ¿µéÀ» ´ãÀº ¿¬°á ¸®½ºÆ®¸¦ ½ºÆ®¸²¿¡ .write()¸¦ »ç¿ëÇÏ¿© ¾´´Ù. """ self.write(''.join(list)) def reset(self): """ »óÅÂÁ¤º¸¸¦ À¯ÁöÇϴµ¥ »ç¿ëÇÏ´ø ÄÚµ¦ ¹öÆÛ¸¦ û¼ÒÇÏ°í »õ·ÎÀÌ ¼³Á¤ÇÑ´Ù. ÀÌ ¸Þ½îµå¸¦ È£ÃâÇÏ´Â ÀÌÀ¯´Â Ãâ·Â »óÅ¿¡ ÀÖ´Â µ¥ÀÌŸ°¡ ±ú²ýÇÑ »óÅ·ΠµÈ °ÍÀ» È®ÀÎÇÏ´Â °ÍÀ̸ç, ÀÌ·¸°Ô ÇÔÀ¸·Î½á, »óÅÂÁ¤º¸¸¦ ȸº¹Çϱâ À§ÇÏ¿© ±× Àüü ½ºÆ®¸²À» À玻öÇÒ ÇÊ¿ä¾øÀÌ »õ·ÎÀÌ ½Å¼±ÇÑ µ¥ÀÌŸ¸¦ Ãß°¡ÇÒ ¼ö ÀÖ´Ù. """ pass def __getattr__(self,name, getattr=getattr): """ ´Ù¸¥ ¸ðµç ¸Þ½îµåµéÀº ÇϺÎÀÇ ½ºÆ®¸²À¸·ÎºÎÅÍ »ó¼Ó¹Þ´Â´Ù. """ return getattr(self.stream,name) class StreamReader(Codec): def __init__(self,stream,errors='strict'): """ StreamReader ½Çü¸¦ »ý¼ºÇÑ´Ù. ½ºÆ®¸²Àº Àбâ (ÀÌÁø) µ¥ÀÌŸ·Î ¿·ÁÁø ÆÄÀÏ-ºñ½ÁÇÑ °´Ã¼¿©¾ß ÇÑ´Ù. ½ºÆ®¸²ÀԷ±â(StreamReader)´Â errors Å°¿öµå Àμö¸¦ Á¦°øÇÔÀ¸·Î½á ´Ù¸¥ ¿¡·¯ ó¸® Àü·«À» ±¸ÇöÇصµ ÁÁ´Ù. ÀÌ·¯ÇÑ ¸Å°³º¯¼öµéÀº ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ: 'strict' - ValueError¸¦ ÀÏÀ¸Å²´Ù (or a subclass) 'ignore' - ±× ¹®ÀÚ¸¦ ¹«½ÃÇÏ°í ´ÙÀ½À» °è¼ÓÇÏ¿© ó¸®ÇÑ´Ù 'replace'- ÀûÀýÇÑ ´ëü ¹®ÀÚ·Î ´ëüÇÑ´Ù; """ self.stream = stream self.errors = errors def read(self,size=-1): """ ½ºÆ®¸² self.streamÀ¸·ÎºÎÅÍ ÄÚµåÇؼ®ÇÑ´Ù ±×¸®°í ±× °á°ú °´Ã¼¸¦ ¹ÝȯÇÑ´Ù. size´Â ÄÚµåÇؼ®À» À§ÇÏ¿© ±× ½ºÆ®¸²À¸·ÎºÎÅÍ Àоî¾ß ÇÒ ÃÖ´ëÀÇ ¹ÙÀÌÆ® °³¼ö¸¦ ³ªÅ¸³½´Ù. ÄÚµåÇؼ®±â´Â ÀÌ ¼³Á¤À» ÀûÀýÇÏ°Ô º¯°æÇÒ ¼ö ÀÖ´Ù. ±âº» °ªÀº -1Àε¥ °¡´ÉÇÑ ÃÖ´ë·Î Àаí ÄÚµåÇؼ®Ç϶ó´Â °ÍÀ» ÀǹÌÇÑ´Ù. sizeÀÇ ¸ñÀûÀº °Å´ëÇÑ ÆÄÀÏÀ» ÇÑ ¹ø¿¡ ÄÚµåÇؼ®ÇÏ´Â °ÍÀ» ¸·´Â °ÍÀÌ´Ù. ÀÌ ¸Þ½îµå´Â ¿å½É²¯ Àбâ Àü·«À» »ç¿ëÇÏ¿©¾ß Çϴµ¥ ´Ù½Ã¸»Çϸé ÁÖ¾îÁø Å©±â¿Í ÄÚµåÀüȯÀÇ Á¤ÀǾȿ¡¼ ÃÖ´ëÇÑÀÇ µ¥ÀÌŸ¸¦ Àоî¾ß¸¸ ÇÑ´Ù´Â °ÍÀÌ´Ù. ¿¹¸¦ µé¾î, ¸¸¾à ¼±ÅÃÀûÀÎ ÄÚµåÀüȯ Á¾·áÇ¥½Ã ȤÀº »óÅÂÁ¤º¸ Ç¥½Ä¼³Á¤ÀÌ ±× ½ºÆ®¸²¿¡ »ç¿ë°¡´ÉÇÏ´Ù¸é, ÀÌ°Íµé ¿ª½Ã ÀÐÇôÁ®¾ß ÇÑ´Ù. """ # Á¶°¢½äÁö ¾ÊÀº Àбâ: if size < 0: return self.decode(self.stream.read())[0] # Á¶°¢½ä¸° Àбâ: read = self.stream.read decode = self.decode data = read(size) i = 0 while 1: try: object, decodedbytes = decode(data) except ValueError,why: # ÀÌ ¸Þ½îµå´Â ´À¸®Áö¸¸ »ó´çÈ÷ ¸¹Àº Á¶°Ç¾Æ·¡¿¡¼ # Àß ÀÛµ¿ÇÑ´Ù; ±â²¯ÇØ¾ß 10¹øÁ¤µµ ½ÃµµÇÑ´Ù i = i + 1 newdata = read(1) if not newdata or i > 10: raise data = data + newdata else: return object def readline(self, size=None): """ ÇÑ ¶óÀÎÀ» ÀÔ·ÂÀ¸·ÎºÎÅÍ ÀÐ¾î¼ ÄÚµåÇؼ®µÈ µ¥ÀÌŸ¸¦ ¹ÝȯÇÑ´Ù. ÁÖÀÇ: ÀÌ ¸Þ½îµå´Â, .readlines() ¸Þ½îµå¿Í´Â ´Ù¸£°Ô, ÇϺÎÀÇ .readline() ¸Þ½îµå·ÎºÎÅÍ ¶óÀÎ ³Ñ±è Áö½ÄÀ» »ó¼Ó¹Þ´Â´Ù. -- ÇöÀç·Î´Â ¶óÀÎ ¹öÆÛ¸µÀÇ ºÎÀç·Î ÀÎÇÏ¿© ÄÚµ¦ ÄÚµåÇؼ®±â¸¦ »ç¿ëÇÑ ¶óÀÎ ³Ñ±èÀ» Áö¿øÇÏÁö ¾Ê´Â´Ù. ±×·¸Áö¸¸, °¡´ÉÇÏ´Ù¸é, ÇϺÎŬ·¡½ºµéÀº ÀÚ½ÅÀÌ °¡Áø ¶óÀÎ ³Ñ±èÁö½ÄÀ» »ç¿ëÇÏ¿© ÀÌ ¸Þ½îµå¸¦ ±¸ÇöÇϵµ·Ï ³ë·ÂÇØ¾ß ÇÑ´Ù. Å©±â(size)´Â, ÁÖ¾îÁø´Ù¸é, ±× ½ºÆ®¸²ÀÇ .readline() ¸Þ½îµå¿¡ Å©±â Àμö·Î °Ç³×Áø´Ù. """ if size is None: line = self.stream.readline() else: line = self.stream.readline(size) return self.decode(line)[0] def readlines(self, sizehint=0): """ ÀÔ·Â ½ºÆ®¸²¿¡¼ °¡´ÉÇÑ ¸ðµç ¶óÀÎÀ» Àд´٠±×¸®°í ±×°ÍµéÀ» ¶óÀÎÀÇ ¸®½ºÆ®·Î ¹ÝȯÇÑ´Ù. ÁÙ ³Ñ±èÀº ÄÚµ¦ÀÇ ÄÚµåÇؼ® ¸Þ½îµå¸¦ »ç¿ëÇÏ¿© ±¸ÇöµÇ°í ±× ¸®½ºÆ® Ç׸ñ¿¡ Æ÷ÇԵȴÙ. Å©±â ÈùÆ®(sizehint)´Â, ÁÖ¾îÁø´Ù¸é, ±× ½ºÆ®¸²ÀÇ .read() ¸Þ½îµå¿¡ Å©±â Àμö·Î °Ç³×Áø´Ù. """ if sizehint is None: data = self.stream.read() else: data = self.stream.read(sizehint) return self.decode(data)[0].splitlines(1) def reset(self): """ »óÅÂÁ¤º¸¸¦ À¯ÁöÇϴµ¥ »ç¿ëµÈ ÄÚµ¦ ¹öÆÛ¸¦ Àç¼³Á¤ÇÑ´Ù. ½ºÆ®¸² À§Ä¡ ¼³Á¤ÀÌ ´Ù½Ã ÀϾ¸é ¾ÈµÇ¹Ç·Î ÁÖÀÇÇ϶ó. ÀÌ ¸Þ½îµåÀÇ ¸ñÀûÀº ÁÖ·Î ÄÚµåÇؼ®ÁßÀÇ ¿¡·¯·ÎºÎÅÍ º¹±¸¸¦ ÇÏ´Â °ÍÀÌ´Ù. """ pass def __getattr__(self,name, getattr=getattr): """ ´Ù¸¥ ¸ðµç ¸Þ½îµå´Â ÇϺÎÀÇ ½ºÆ®¸²À¸·ÎºÎÅÍ »ó¼Ó¹Þ´Â´Ù. """ return getattr(self.stream,name) |
½ºÆ®¸² ÄÚµ¦ ±¸ÇöÀÚ´Â ÀÚÀ¯·Ó°Ô ½ºÆ®¸²Ãâ·Â±â¿Í ½ºÆ®¸²ÀԷ±â ÀÎÅÍÆäÀ̽º¸¦ ÇϳªÀÇ Å¬·¡½º¿¡ °áÇÕÇÒ ¼ö ÀÖ½À´Ï´Ù. ½ÉÁö¾î´Â ÀÌ ¸ðµç °ÍµéÀ» ÄÚµ¦ Ŭ·¡½º¿Í °áÇÕÇÏ´Â °Íµµ °¡´ÉÇÕ´Ï´Ù.
±¸ÇöÀÚµéÀº ÀÚÀ¯·Ó°Ô Ãß°¡ÀûÀÎ ¸Þ½îµåµéÀ» Ãß°¡ÇÏ¿© ÄÚµ¦ÀÇ ±â´ÉÀ» °³¼±Çϰųª ¸Þ½îµåµéÀÌ ÀÛµ¿Çϴµ¥ ÇÊ¿äÇÑ ¿©ºÐÀÇ »óÅ Á¤º¸¸¦ Á¦°øÇÒ ¼ö ÀÖ½À´Ï´Ù. ±×·¸Áö¸¸, ³»ºÎ ÄÚµ¦ ±¸ÇöÀº À§ÀÇ ÀÎÅÍÆäÀ̽º¸¸À» »ç¿ëÇÒ °ÍÀÔ´Ï´Ù.
À¯´ÏÄÚµå ±¸ÇöÀÌ ÀÌ·¯ÇÑ ±âº» Ŭ·¡½ºµéÀ» »ç¿ëÇØ¾ß ÇÏ´Â °ÍÀº ¾Æ´Õ´Ï´Ù, ¿ÀÁ÷ ÀÎÅÍÆäÀ̽º¸¸ ÀÏÄ¡ÇÏ¸é µË´Ï´Ù; ÀÌ°ÍÀ¸·Î ÄÚµ¦À» È®Àå ÇüÀ¸·Î ÀÛ¼ºÇÒ ¼ö ÀÖ½À´Ï´Ù.
ÇÑ°è¼±À¸·Î¼, °Å´ëÇÑ Â¦Áþ±â Å×À̺íÀº Á¤Àû C µ¥ÀÌŸ¸¦ »ç¿ëÇÏ¿© º°°³ÀÇ (°øÀ¯) È®Àå ¸ðµâ·Î ±¸ÇöµÇ¾î¾ß ÇÕ´Ï´Ù. ±×·¸°Ô ÇÏ¸é ¿©·¯ ÇÁ·Î¼¼½ºµéÀÌ °°Àº µ¥ÀÌŸ¸¦ °øÀ¯ÇÒ ¼ö ÀÖ½À´Ï´Ù.
À¯´ÏÄÚµå ¦Áþ±â ÆÄÀÏÀ» ¦Áþ±â ¸ðµâ·Î ÀÚµ¿-º¯È¯ÇÏ´Â µµ±¸°¡ Á¦°øµÇ¾î Ãß°¡ÀûÀΠ¦Áþ±â¿¡ ´ëÇÑ Áö¿øÀ» °£´ÜÇÏ°Ô ÇÏ¿© ÁÖ¾î¾ß ÇÕ´Ï´Ù (References ÂüÁ¶).
.split() ¸Þ½îµå´Â À¯´ÏÄڵ忡¼ ¾î¶² °ÍÀÌ °ø¹éÀ¸·Î °£ÁֵǴÂÁö¸¦ ¾Ë¾Æ¾ß ÇÒ ÇÊ¿ä°¡ ÀÖÀ» °ÍÀÔ´Ï´Ù.
´ë¼Ò¹®ÀÚ º¯È¯Àº À¯´ÏÄÚµå µ¥ÀÌŸ¿¡´Â ¾à°£ º¹ÀâÇÕ´Ï´Ù, ¿Ö³ÄÇÏ¸é µ¹¾Æº¸¾Æ¾ß ÇÒ ¸¹Àº ´Ù¸¥ Á¶°ÇµéÀÌ Àֱ⠶§¹®ÀÔ´Ï´Ù. ´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr21/´ë¼Ò¹®ÀÚ º¯È¯À» ±¸ÇöÇÏ´Â °Í¿¡ ´ëÇÑ ¾à°£ÀÇ °¡À̵å¶óÀÎÀ» º¸½Ç ¼ö ÀÖ½À´Ï´Ù.
ÆÄÀ̽㿡 ´ëÇÏ¿©, ¿ì¸®´Â À¯´ÏÄڵ忡 Æ÷ÇÔµÈ 1-1 º¯È¯¸¸À» ±¸ÇöÇØ¾ß ÇÕ´Ï´Ù. ·ÎÄÉÀÏ¿¡ ÀÇÁ¸ÀûÀÎ ±×¸®°í ´Ù¸¥ Ư¼öÇÑ ´ë¼Ò¹®ÀÚ º¯È¯Àº (´ÙÀ½À» ÂüÁ¶: Unicode standard file SpecialCasing.txt) »ç¿ëÀÚÀÇ ¿µ¿ª ·çƾ¿¡ ³²°ÜÁ®¾ß Çϸç ÀÎÅÍÇÁ¸®ÅÍÀÇ Çٽɿ¡ µé¾î°¡Áö ¾Ê¾Æ¾ß ÇÕ´Ï´Ù.
¸Þ½îµå .capitalize()¿Í .iscapitalized()´Â À§ÀÇ ±â¼úÀû º¸°í¼¿¡ Á¤ÀÇµÈ ´ë¼Ò¹®ÀÚ Â¦Áþ±â ¾Ë°í¸®ÁòÀ» °¡´ÉÇÑÇÑ ±ÙÁ¢ÇÏ°Ô µû¶ó¾ß ÇÕ´Ï´Ù.
ÁÙ ¹Ù²ÞÀº CRLF, CR, LF¿ÍÀÇ Á¶ÇÕ°ú ÇÔ²² B ¼Ó¼ºÀ» °¡Áö´Â ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ¿µé°ú Ç¥ÁØ¿¡¼ Á¤ÀÇµÈ ´Ù¸¥ Ư¼öÇÑ ¶óÀÎ ºÐ¸®Àڵ鿡 ´ëÇÏ¿© (±× ¼ø¼·Î ¹ø¿ªµÇ¾î) ¼öÇàµÇ¾î¾ß ÇÕ´Ï´Ù. ´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr13/ÁÙ ¹Ù²Þ°ú »õ·Î¿î¶óÀÎ(newline) 󸮿¡ °üÇÑ ¾à°£ÀÇ °¡À̵å¶óÀÎÀ» º¸½Ç ¼ö ÀÖ½À´Ï´Ù.
Unicode ÇüÀº .splitlines() ¸Þ½îµå¸¦ Á¦°øÇØ¾ß Çϴµ¥ ÀÌ°ÍÀº À§ÀÇ ÁöÁ¤¿¡ µû¸¥ ¶óÀÎÀÇ ¸®½ºÆ®¸¦ ¹ÝȯÇÕ´Ï´Ù. À¯´ÏÄÚµå ¸Þ½îµåµéÀ» ÂüÁ¶Çϼ¼¿ä.
º°°³ÀÇ "unicodedata" ¸ðµâÀº Ç¥ÁØ UnicodeData.txt ÆÄÀÏ¿¡ Á¤ÀÇµÈ ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ ¼Ó¼ºµé¿¡ ´ëÇÏ¿© °£·«ÇÑ ÀÎÅÍÆäÀ̽º¸¦ Á¦°øÇØ¾ß ÇÕ´Ï´Ù.
¹«¾ùº¸´Ùµµ, ÀÌ·¯ÇÑ ¼Ó¼ºµéÀº ¼ýÀÚ, ±âÈ£, °ø¹é(numbers, digits, spaces, whitespace) µîµîÀ» ÀνÄÇÏ´Â ¹æ¹ýÀ» Á¦°øÇÕ´Ï´Ù.
ÀÌ ¸ðµâÀº ¸ðµç À¯´ÏÄÚµå ¹®Àڵ鿡 ´ëÇÑ Á¢±ÙÀ» Á¦°øÇØ¾ß ÇÒ °ÍÀ̹ǷÎ, °á±¹ ÀÌ ¸ðµâÀÌ Æ÷ÇÔÇؾßÇÒ µ¥ÀÌŸ´Â UnicodeData.txt¿¡ Àִµ¥ ´ë·« 600kB³ª Á¡À¯ÇÒ °ÍÀÔ´Ï´Ù. ÀÌ·¯ÇÑ ÀÌÀ¯·Î, ±× µ¥ÀÌŸ´Â Á¤ÀûÀÎ(static) C µ¥ÀÌŸ·Î ÀúÀåµÇ¾î¾ß ÇÕ´Ï´Ù. ÀÌ·¸°Ô ÇÏ¸é °øÀ¯ ¸ðµâ·Î ÄÄÆÄÀÏ ÇÒ ¼ö ÀÖ¾î¼ ¹Ø¿¡ ±ò¸° ¿î¿µÃ¼Á¦°¡ ±×°ÍÀ» ÇÁ·Î¼¼½ºµé »çÀÌ¿¡ °øÀ¯ÇÒ ¼ö ÀÖ½À´Ï´Ù (º¸ÅëÀÇ ÆÄÀ̽ã ÄÚµå ¸ðµâ°ú´Â ´Ù¸¨´Ï´Ù).
ÀÌ Á¤º¸¿¡ Á¢±ÙÇϱâ À§ÇÑ Ç¥ÁØ ÆÄÀ̽ã ÀÎÅÍÆäÀ̽º°¡ ÀÖ¾î¾ß ÇÏ¸ç ±×·¸°Ô ÇØ¾ß ´Ù¸¥ ±¸ÇöÀÚµéÀÌ ÀڽŸ¸ÀÌ °¡´ÉÇÑ °³¼±µÈ ¹öÀü, ¿¹¸¦ µé¾î, ±× µ¥ÀÌŸ¸¦ ºÐÁÖÇÏ°Ô Ç®¾î³»´Â °Í°ú °°Àº ¹öÀüÀ» ²È¾Æ ³ÖÀ» ¼ö ÀÖ½À´Ï´Ù.
ÀÌ¿¡ ´ëÇÑ Áö¿øÀº »ç¿ëÀÚÀÇ ¶¥ ÄÚµ¦¿¡ ³²°ÜÁ® ÀÖÀ¸¸ç ¸í½ÃÀûÀ¸·Î ÆÄÀ̽ãÀÇ Çٽɿ¡ ÅëÇÕµÇ¾î µé¾î°¡Áö ¾Ê½À´Ï´Ù. ÁÖ¸ñÇÒ °ÍÀº ³»ºÎ Çü½ÄÀÌ ±¸ÇöµÈ ¹æ½Ä ¶§¹®¿¡ »çÀû ÄÚµåÀüȯ¿¡ ´ëÇÏ¿© ¿ÀÁ÷ \uE000¿¡¼ \uF8FF±îÁöÀÇ ¿µ¿ª¸¸ÀÌ »ç¿ë°¡´ÉÇÕ´Ï´Ù.
ÆÄÀ̽㠰´Ã¼¿¡ ´ëÇÑ ³»ºÎ Çü½ÄÀº ÆÄÀ̽㿡 ƯÀ¯ÇÑ °íÁ¤µÈ Çü½Ä <PythonUnicode>À» »ç¿ëÇØ¾ß ÇÕ´Ï´Ù. ÀÌ°ÍÀº 'unsigned short' ¶Ç´Â (16 ºñÆ®¸¦ °¡Áö´Â ¶Ç ´Ù¸¥ ºÎÈ£¾ø´Â ¼öÄ¡ Çü)À¸·Î ±¸ÇöµÇ¾î ÀÖ½À´Ï´Ù . ¹ÙÀÌÆ® ¼ø¼´Â Ç÷§Æû¿¡ ÀÇÁ¸ÀûÀÔ´Ï´Ù.
ÀÌ Çü½ÄÀº »óÀÀÇÏ´Â À¯´ÏÄÚµå ¼¼öÀÇ UTF-16 ÄÚµåÀüȯÀ» À¯ÁöÇÒ °ÍÀÔ´Ï´Ù. ÆÄÀ̽ã À¯´ÏÄÚµå ±¸ÇöÀº ÀÌ·¯ÇÑ °ªµé¿¡ ¸¶Ä¡ UCS-2 °ªÀÎ °Íó·³ Á¢±ÙÇÕ´Ï´Ù. UCS-2¿Í UTF-16Àº ÇöÀç Á¤ÀÇµÈ ¸ðµç À¯´ÏÄÚµå ¹®ÀÚ Æ÷ÀÎÆ®¿¡ ´ëÇÏ¿© °°½À´Ï´Ù. ´ë¸® ¾ø´Â UTF-16Àº ¾à 64k ¹®ÀÚ¿¡ ´ëÇÑ Á¢±ÙÀ» Á¦°øÇϸç À¯´ÏÄÚµåÀÇ BMP¿¡ Á¤ÀÇµÈ ¸ðµç ¹®ÀÚµéÀ» ó¸®ÇÕ´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼ ±¸¼ºÀÚ¿¡ ÄÚµ¦ÀÌ º¸³»´Â µ¥ÀÌŸ°¡ ÀÌ·¯ÇÑ °¡Á¤À» ¹Ý¿µÇϴ°¡ È®ÀÎÇÏ´Â °ÍÀº ÄÚµ¦ÀÇ Ã¥ÀÓÀÔ´Ï´Ù. À¯´ÏÄÚµå °´Ã¼ ±¸¼ºÀÚ´Â À¯´ÏÄڵ忡 ¸Â´Â µ¥ÀÌŸÀÎÁö ȤÀº ´ë¸®¸¦ »ç¿ëÇÏ°í ÀÖ´ÂÁö¿¡ ´ëÇÏ¿© Á¡°ËÇÏÁö ¾Ê½À´Ï´Ù.
¹Ì·¡ÀÇ ±¸ÇöÀº 32 ºñÆ® Á¦ÇÑÀ» È®ÀåÇÏ¿© UTF-16ÀÌ Á¢±ÙÇÒ ¼ö ÀÖ´Â ¸ðµç ¹®ÀÚ(¾à 1M¹®ÀÚ)¸¦ Æ÷ÇÔÇÑ ¿ÏÀüÇÑ ¼¼Æ®±îÁö È®ÀåÇÒ °ÍÀÔ´Ï´Ù.
À¯´ÏÄÚµå API´Â <PythonUnicode>¿¡¼ ÄÄÆÄÀÏ·¯ÀÇ wchar_t·ÎÀÇ ÀÎÅÍÆäÀ̽º ·çƾÀ» Á¦°øÇØ¾ß ÇÕ´Ï´Ù. wchar_t´Â compiler/libc/platformÀÌ »ç¿ëµÇ´Âµ¥ µû¶ó¼ 16 ȤÀº 32 ºñÆ®ÀÏ ¼ö ÀÖ½À´Ï´Ù.
À¯´ÏÄÚµå °´Ã¼´Â ij½¬µÈ ÆÄÀ̽㠹®ÀÚ¿ °´Ã¼ <defenc>¿¡ ´ëÇÏ¿© Æ÷ÀÎÅ͸¦ °¡Á®¾ß ÇÕ´Ï´Ù. <defenc>´Â ±× °´Ã¼ÀÇ °ªÀ» <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÏ¿© À¯ÁöÇÕ´Ï´Ù. ÀÌ°ÍÀÌ ÇÊ¿äÇÑ ÀÌÀ¯´Â ¼öÇà¼Óµµ¿Í ³»ºÎ Çؼ®ÀÇ ¹®Á¦¶§¹®ÀÔ´Ï´Ù ('³»ºÎ Àμö Çؼ®'À» ÂüÁ¶Çϼ¼¿ä). ù¹ø°·Î <±âº» ÄÚµåÀüȯ>¿¡ ´ëÇÑ º¯È¯ ¿ä±¸°¡ À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿© Á¦ÃâµÇ¸é ±× ¹öÆÛ°¡ ä¿öÁý´Ï´Ù.
³»ºÎÈ´Â (ÇöÀç·Î´Â) ºÒÇÊ¿äÇÕ´Ï´Ù, ¿Ö³ÄÇϸé ÆÄÀ̽㠽ĺ°ÀÚµéÀº ¿ÀÁ÷ ¾Æ½ºÅ°·Î¸¸ Á¤ÀǵǾî Àֱ⠶§¹®ÀÔ´Ï´Ù.
codecs.BOMÀº ³»ºÎÀûÀ¸·Î »ç¿ëµÇ´Â Çü½Ä¿¡ ´ëÇÏ¿© ¹ÙÀÌÆ® ¼ø¼ Ç¥½Ä¼³Á¤(BOM)À» ¹ÝȯÇØ¾ß ÇÕ´Ï´Ù. ÄÚµ¦ ¸ðµâÀº ´ÙÀ½ÀÇ »ó¼öµéÀ» ÆíÀÇ¿Í ÂüÁ¶¸¦ À§ÇÏ¿© Ãß°¡ÀûÀ¸·Î Á¦°øÇØ¾ß ÇÕ´Ï´Ù (codecs.BOMÀº Ç÷§Æû¿¡ µû¶ó¼ BOM_BE ȤÀº BOM_LE À̾î¾ß ÇÕ´Ï´Ù):
BOM_BE: '\376\377' (Å«°ª Á¾·áÇü Ç÷§Æû¿¡¼ UTF-16À¸·Î À¯´ÏÄÚµå U+0000FEFF¿¡ »óÀÀÇÑ´Ù == ZERO WIDTH NO-BREAK SPACE) BOM_LE: '\377\376' (ÀÛÀº°ª Á¾·áÇü Ç÷§Æû¿¡¼ UTF-16À¸·Î À¯´ÏÄÚµå U+0000FFFE¿¡ »óÀÀÇÑ´Ù == ºÒ¹ýÀûÀÎ À¯´ÏÄÚµå ¹®ÀÚ·Î Á¤ÀǵȴÙ) BOM4_BE: '\000\000\376\377' (UCS-4·Î À¯´ÏÄÚµå U+0000FEFF¿¡ »óÀÀÇÑ´Ù) BOM4_LE: '\377\376\000\000' (UCS-4·Î À¯´ÏÄÚµå U+0000FFFE¿¡ »óÀÀÇÑ´Ù) |
À¯´ÏÄÚµå´Â Å« °ª Á¾·áÇü ¹ÙÀÌÆ® ¼ø¼¸¦ "¿Ã¹Ù¸£´Ù"¶ó°í °£ÁÖÇÑ´Ù´Â °ÍÀ» ÁÖ¸ñÇϼ¼¿ä. ¹Ù²î¾îÁø ¼ø¼´Â "À߸øµÈ" Çü½Ä, Áï ºÒ¹ýÀûÀÎ ¹®ÀÚ Á¤ÀǶó°í °£Áֵ˴ϴÙ.
ȯ°æ¼³Á¤ ½ºÅ©¸³Æ®´Â ÆÄÀ̽ãÀÌ °íÀ¯ÀÇ wchar_t ÇüÀ» ¾µ ¼ö ÀÖÀ»Áö ¾øÀ» Áö °áÁ¤Çϴµ¥ µµ¿òÀ» ÁÖ¾î¾ß ÇÕ´Ï´Ù. (±×°ÍÀº 16-ºñÆ® unsigned ÇüÀ» °¡Á®¾ß ÇÕ´Ï´Ù).
bf_getcharbuf¿¡ ´ëÇؼ´Â <defenc> ÆÄÀ̽㠹®ÀÚ¿ °´Ã¼¸¦ ±âº»À¸·Î »ç¿ëÇÏ°í bf_getreadbuf¿¡ ´ëÇؼ´Â ³»ºÎ ¹öÆÛ¸¦ »ç¿ëÇÏ¿© ¹öÆÛ ÀÎÅÍÆäÀ̽º¸¦ ±¸ÇöÇÕ´Ï´Ù. ¸¸¾à bf_getcharbuf°¡ ¿ä±¸µÇ°í <defenc> °´Ã¼°¡ ¾ÆÁ÷ Á¸ÀçÇÏÁö ¾Ê´Â´Ù¸é, ¸ÕÀú ±×°ÍÀÌ »ý¼ºµË´Ï´Ù.
ÁÖ¸ñÇÒ °ÍÀº Ưº°ÇÑ »ç·Ê·Î¼, Çؼ®±â Ç¥½Ä¼³Á¤ÀÚÀÎ "s#"´Â (bf_getreadbuf °¡ ¹ÝȯÇÏ´Â) ¹Ì°¡°ø À¯´ÏÄÚµå UTF-16 µ¥ÀÌŸ¸¦ ¹ÝȯÇÏÁö ¾ÊÀ» °ÍÀ̶ó´Â °ÍÀÔ´Ï´Ù, ¿ÀÈ÷·Á ±× À¯´ÏÄÚµå °´Ã¼¸¦ ±âº» ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ÄÚµåÀüȯÇÏ·Á°í ³ë·ÂÇÕ´Ï´Ù ±×¸®°í³ª¼ ±× °á°ú ¹®ÀÚ¿ °´Ã¼¿¡ ´ëÇÑ Æ÷ÀÎÅ͸¦ ¹ÝȯÇÕ´Ï´Ù. (¶Ç´Â ±× º¯È¯ÀÌ ½ÇÆÐÇÏ¸é ¿¹¿Ü¸¦ ÀÏÀ¸Åµ´Ï´Ù). ÀÌ°ÍÀÇ ¸ñÀûÀº ¿ì¿¬ÇÏ°Ô ÀÌÁø µ¥ÀÌŸ¸¦ Ãâ·Â ½ºÆ®¸²¿¡ ¾²°Ô µÉ ¶§ »ó´ë¹æÀÌ ÀÎÁöÇÏÁö ¸øÇÏ´Â °ÍÀ» ¹æÁöÇÏ°íÀÚ ÇÏ´Â °ÍÀÔ´Ï´Ù.
ÀÌ°ÍÀÇ ÀÌÁ¡Àº Ãß°¡ÀûÀ¸·Î »ç¿ëÇؾßÇÒ ÄÚµåÀüȯÀ» ÁöÁ¤ÇÒ ÇÊ¿ä¾øÀÌ (ÀüÇüÀûÀ¸·Î ÀÌ·¯ÇÑ ÀÎÅÍÆäÀ̽º¸¦ »ç¿ëÇÏ´Â) Ãâ·Â ½ºÆ®¸²¿¡ ¾µ ¼ö ÀÖ´Ù´Â °ÍÀÔ´Ï´Ù.
¸¸¾à À¯´ÏÄÚµå °´Ã¼ÀÇ Àб⠹öÆÛ ÀÎÅÍÆäÀ̽º¿¡ Á¢±ÙÇÒ ÇÊ¿ä°¡ ÀÖ´Ù¸é PyObject_AsReadBuffer() ÀÎÅÍÆäÀ̽º¸¦ »ç¿ëÇϼ¼¿ä.
³»ºÎ Çü½ÄÀº '³»ºÎȵÈ-À¯´ÏÄÚµå' ÄÚµ¦À» »ç¿ëÇÏ¿©, ¿¹¸¦ µé¾î, u.encode('unicode-internal')¸¦ ÅëÇÏ¿© ¶ÇÇÑ Á¢±ÙµÉ ¼ö ÀÖ½À´Ï´Ù.
°íÀ¯ÀÇ À¯´ÏÄÚµå °´Ã¼¸¦ Áö¿øÇØ¾ß ÇÕ´Ï´Ù. ±× °´Ã¼µéÀº Ç÷§Æû¿¡ µ¶¸³ÀûÀÎ ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ÄÚµåÀüȯµÇ¾î¾ß ÇÕ´Ï´Ù.
¹è¿Çϱâ(Marshal)Àº UTF-8¸¦ »ç¿ëÇؾ߸¸ ÇÏ°í ÀýÀ̱â(Pickle)´Â (ÅؽºÆ® ¸ðµå¿¡¼) Raw-Unicode-Escape ¶Ç´Â (ÀÌÁø ¸ðµå¿¡¼) UTF-8À» ÄÚµåÀüȯÀ¸·Î »ç¿ëÇؾ߸¸ ÇÕ´Ï´Ù. UTF-16 ´ë½Å¿¡ UTF-8À» »ç¿ëÇϸé BOM Ç¥½ÄÀ» ÀúÀåÇÒ ÇÊ¿ä°¡ ¾ø´Â ÀÌÁ¡ÀÌ ÀÖ½À´Ï´Ù.
'Secret Labs AB'´Â À¯´ÏÄڵ带-ÀνÄÇÏ´Â Á¤±Ô Ç¥Çö½Ä ¸Ó½Å¿¡ ´ëÇÏ¿© ¿¬±¸ÇÏ°í ÀÖ½À´Ï´Ù. ±× Á¤±Ô Ç¥Çö½ÄÀº Æò¹® 8-ºñÆ®, UCS-2, ±×¸®°í (¼±ÅÃÀûÀ¸·Î) UCS-4 ³»ºÎ ¹®ÀÚ ¹öÆÛ¿¡ ÀÛµ¿ÇÕ´Ï´Ù.
´ÙÀ½À» º¸½Ã¸é
http://www.unicode.org/unicode/reports/tr18/À¯´ÏÄÚµå Á¤±ÔÇ¥Çö½ÄÀ» ´Ù·ç´Â ¹ý¿¡ ´ëÇÑ Æò°¡¸¦ º¼ ¼ö ÀÖ½À´Ï´Ù.
Çü½ÄÈ Ç¥½Ä¼³Á¤ÀÚ´Â ÆÄÀ̽ãÀÇ Çü½ÄÈ ¹®ÀÚ¿¿¡ »ç¿ëµÈ´Ù. ¸¸¾à ÆÄÀ̽㠹®ÀÚ¿ÀÌ Çü½ÄÈ ¹®ÀÚ¿·Î »ç¿ëµÈ´Ù¸é, ´ÙÀ½ÀÇ ¹ø¿ªÀº È¿·ÂÀ» ¹ßÈÖÇØ¾ß ÇÕ´Ï´Ù:
'%s': À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿© ÀÌ°ÍÀº ±× Àüü Çü½ÄÈ ¹®ÀÚ¿À» °Á¦·Î À¯´ÏÄÚµå·Î º¯È¯ÇÏ°Ô ¸¸µé °ÍÀÌ´Ù. ÁÖ¸ñÇÒ °ÍÀº ¼öÇà¼ÓµµÀÇ ¹®Á¦ ¶§¹®¿¡ óÀ½ºÎÅÍ À¯´ÏÄÚµå Çü½ÄÈ ¹®ÀÚ¿À» »ç¿ëÇÏ¿©¾ß ÇÑ´Ù. |
Çü½ÄÈ ¹®ÀÚ¿ÀÌ À¯´ÏÄÚµå °´Ã¼ÀÎ °æ¿ì¿¡, ¸ðµç ¸Å°³º¯¼öµéÀº ¸ÕÀú °Á¦·Î À¯´ÏÄÚµå·Î º¯È¯µÇ°í ±×¸®°í ³ª¼ ´Ù½Ã Á¶¸³µÇ¾î ±× Çü½ÄÈ ¹®ÀÚ¿¿¡ ¸Â°Ô Çü½Äȵ˴ϴÙ. ¼öÄ¡µéÀº ¸ÕÀú ¹®ÀÚ¿·Î º¯È¯µÇ°í ´ÙÀ½¿¡ À¯´ÏÄÚµå·Î º¯È¯µË´Ï´Ù.
'%s': ÆÄÀ̽㠹®ÀÚ¿Àº <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÏ¿© À¯´ÏÄÚµå ¹®ÀÚ¿·Î ¹ø¿ªµÈ´Ù. À¯´ÏÄÚµå °´Ã¼´Â ÀÖ´Â ±×´ë·Î ¹Þ¾ÆµéÀδÙ. |
´Ù¸¥ ¸ðµç ¹®ÀÚ¿ Çü½Äȼ³Á¤Àڵ鵵 ÀûÀýÇÏ°Ô ÀÛµ¿ÇØ¾ß ÇÕ´Ï´Ù.
Example: u"%s %s" % (u"abc", "abc") == u"abc abc" |
ÀÌ·¯ÇÑ Ç¥½Ä¼³Á¤ÀÚµéÀº PyArg_ParseTuple() API¿¡ ÀÇÇؼ »ç¿ëµË´Ï´Ù:
"U": À¯´ÏÄÚµå °´Ã¼Àΰ¡¸¦ Á¡°ËÇÏ°í ±×°Í¿¡ ´ëÇÑ ÂüÁ¶Á¡À» ¹ÝȯÇÑ´Ù "s": À¯´ÏÄÚµå °´Ã¼¿¡ ´ëÇÏ¿©: ±× °´Ã¼ÀÇ <defenc> ¹öÆÛ¿¡ ´ëÇÑ ÂüÁ¶Á¡À» ¹ÝȯÇÑ´Ù (ÀÌ°ÍÀº <±âº» ÄÚµåÀüȯ>À» »ç¿ëÇÑ´Ù). "s#": ±× À¯´ÏÄÚµå °´Ã¼ÀÇ ±âº» ÄÚµåÀüȯ ¹öÀü¿¡ Á¢±ÙÇÑ´Ù. (¹öÆÛ ÀÎÅÍÆäÀ̽º ÂüÁ¶); ±× ±æÀÌ´Â ±âº» ÄÚµåÀüȯ ¹®ÀÚ¿ÀÇ ±æÀÌ¿¡ °ü·ÃµÈ´Ù. ±× À¯´ÏÄÚµå °´Ã¼ÀÇ ±æÀÌ¿¡ °ü·ÃµÇ´Â °ÍÀÌ ¾Æ´Ï¶ó´Â °ÍÀ» ÁÖ¸ñÇ϶ó. "t#": ´ÙÀ½°ú µ¿ÀÏ "s#". "es": µÎ °³ÀÇ ¸Å°³º¯¼ö¸¦ ÃëÇÑ´Ù: (const char *)°ú buffer (char **). ÀÔ·Â °´Ã¼´Â ¸ÕÀú Åë»óÀûÀÎ ¹æ½ÄÀ¸·Î À¯´ÏÄÚµå·Î °Á¦ º¯È¯µÈ´Ù ±×¸®°í ³ª¼ ÁÖ¾îÁø ÄÚµåÀüȯÀ» »ç¿ëÇÏ¿© ¹®ÀÚ¿·Î ÄÚµåÀüȯµÈ´Ù. Ãâ·Â½Ã¿¡, ÇÊ¿äÇÑ ±æÀÌ ¸¸ÅÀÇ ¹öÆÛ°¡ ÇÒ´çµÇ°í *buffer¸¦ ÅëÇÏ¿© NULL-Á¾·á ¹®ÀÚ¿·Î ¹ÝȯµÈ´Ù. ÄÚµåÀüȯµÈ ¹®ÀÚ¿Àº ³»ÀåµÈ NULL ¹®ÀÚµéÀ» Æ÷ÇÔÇÒ ¼ö ¾ø´Ù. È£ÃâÀÚ´Â PyMem_Free()À» È£ÃâÇؼ »ç¿ëÇÏ°í ³ ´ÙÀ½¿¡´Â ÇÒ´çµÈ *buffer¸¦ Ç®¾îÁ٠åÀÓÀÌ ÀÖ´Ù. "es#": ¼¼ °³ÀÇ ¸Å°³ º¯¼ö¸¦ ÃëÇÑ´Ù: encoding (const char *), buffer (char **) ±×¸®°í buffer_len (int *). ÀÔ·Â °´Ã¼´Â ¸ÕÀú Åë»óÀûÀÎ ¹æ½ÄÀ¸·Î À¯´ÏÄÚµå·Î °Á¦º¯È¯µÈ´Ù ±×¸®°í ³ª¼ ÁÖ¾îÁø ÄÚµåÀüȯÀ¸·Î ¹®ÀÚ¿·Î ÄÚµåÀüȯµÈ´Ù. *buffer°¡ non-NULLÀ̶ó¸é, *buffer_lenÀº Ãâ·Â½Ã¿¡ ¹Ýµå½Ã sizeof(buffer)·Î ¼³Á¤µÇ¾î¾ß ÇÑ´Ù. Ãâ·ÂÀº ±×·¯¸é *buffer¿¡ º¹»çµÈ´Ù. ¸¸¾à *buffer°¡ NULLÀ̸é, ÇÊ¿äÇÑ ±æÀÌÀÇ ¹öÆÛ°¡ ÇÒ´çµÇ°í Ãâ·ÂÀº ±× ¹öÆÛ¿¡ º¹»çµÈ´Ù. ±×·¯¸é *buffer´Â °»½ÅµÇ¾î ÇÒ´çµÈ ¸Þ¸ð¸® Áö¿ªÀ» °¡¸£Å²´Ù. È£ÃâÀÚ´Â »ç¿ë ÈÄ¿¡ PyMem_Free()¸¦ È£ÃâÇÏ¿© ÇÒ´çµÈ *buffer¸¦ Ç®¾îÁ٠åÀÓÀÌ ÀÖ´Ù. µÎ °æ¿ì ¸ðµÎ *buffer_len´Â ¾º¿©Áø ¹®ÀÚ¿ÀÇ °³¼ö·Î °»½ÅµÈ´Ù (À̲ø¸®´Â NULL-¹ÙÀÌÆ®´Â Á¦¿ÜÇÑ´Ù). Ãâ·Â ¹öÆÛ´Â È®½ÇÇÏ°Ô NULL-Á¾·áÇüÀ̾î¾ß ÇÑ´Ù. ¿¹Á¦: ÀÚµ¿-ÇÒ´ç°ú ÇÔ²² "es#"¸¦ »ç¿ëÇϱâ: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; int buffer_len = 0; if (!PyArg_ParseTuple(args, "es#:test_parser", encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); PyMem_Free(buffer); return str; } NULL-Á¾·áÇü ¹®ÀÚ¿À» ¹ÝȯÇÏ´Â ÀÚµ¿-ÇÒ´ç°ú ÇÔ²² "es"¸¦ »ç¿ëÇϱâ: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char *buffer = NULL; if (!PyArg_ParseTuple(args, "es:test_parser", encoding, &buffer)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromString(buffer); PyMem_Free(buffer); return str; } ¹Ì¸®-ÇÒ´çµÈ ¹öÆÛ¿Í ÇÔ²² "es#"À» »ç¿ëÇϱâ: static PyObject * test_parser(PyObject *self, PyObject *args) { PyObject *str; const char *encoding = "latin-1"; char _buffer[10]; char *buffer = _buffer; int buffer_len = sizeof(_buffer); if (!PyArg_ParseTuple(args, "es#:test_parser", encoding, &buffer, &buffer_len)) return NULL; if (!buffer) { PyErr_SetString(PyExc_SystemError, "buffer is NULL"); return NULL; } str = PyString_FromStringAndSize(buffer, buffer_len); return str; } |
file.write(object)¿Í ´ëºÎºÐÀÇ ´Ù¸¥ ½ºÆ®¸² ÀÛ¼º±âµéÀº "s#" ¶Ç´Â "t#" Àμö Çؼ® Ç¥½Ä¼³Á¤ÀÚ¸¦ »ç¿ëÇÏ¿© ¾²±âÇÒ µ¥ÀÌŸ¸¦ ¿¶÷ÇϹǷÎ, À¯´ÏÄÚµå °´Ã¼ÀÇ ±âº» ÄÚµåÀüȯµÈ ¹®ÀÚ¿ ¹öÀüÀº ±× ½ºÆ®¸²¿¡ ¾²¿©Áú °ÍÀÔ´Ï´Ù (¹öÆÛ ÀÎÅÍÆäÀ̽º ÂüÁ¶).
À¯´ÏÄڵ带 »ç¿ëÇÏ¿© ¸í½ÃÀûÀ¸·Î ÆÄÀÏÀ» ó¸®Çϱâ À§Çؼ, Ç¥ÁØ ½ºÆ®¸² ÄÚµ¦Àº ±× ÄÚµ¦ ¸ðµâÀ» ÅëÇؼ »ç¿ë°¡´ÉÇÑ ±×´ë·Î »ç¿ëµÇ¾î¾ß ÇÕ´Ï´Ù.
ÄÚµ¦ ¸ðµâÀº ´ÜÃà open(filename,mode,encoding) ÇÔ¼ö¸¦ »ç¿ë°¡´ÉÇϵµ·Ï Á¦°øÇÏ¿©¾ß ÇÏ¸ç ¶ÇÇÑ ÇÊ¿äÇÒ ¶§, ¸ðµå(mode)°¡ 'b'¹®ÀÚ¸¦ Æ÷ÇÔÇÏ´Â °ÍÀ» È®ÀÎÇÏ¿©¾ß ÇÕ´Ï´Ù.
»ç¿ëÀÚ¸¸ÀÌ ÀÔ·Â µ¥ÀÌŸ°¡ »ç¿ëÇÏ´Â ÄÚµåÀüȯÀ» ¾Ë ¼ö ÀÖ½À´Ï´Ù, ±×·¡¼ Ưº°ÇÑ ¸¶¹ýÀÌ Àû¿ëµÇÁö ¾Ê½À´Ï´Ù. »ç¿ëÀÚ´Â ¸í½ÃÀûÀ¸·Î ±× ¹®ÀÚ¿ µ¥ÀÌŸ¸¦ À¯´ÏÄÚµå °´Ã¼·Î ÇÊ¿äÇÒ ¶§ ¸¶´Ù º¯È¯ÇØ¾ß Çϰųª ¶Ç´Â ÄÚµ¦ ¸ðµâ¿¡ Á¤ÀÇµÈ ÆÄÀÏ Æ÷ÀåÀÚ¸¦ »ç¿ëÇØ¾ß ÇÒ °ÍÀÔ´Ï´Ù. (ÆÄÀÏ/½ºÆ®¸² Ãâ·ÂÀ» ÂüÁ¶).
¸ðµç ÆÄÀ̽㠹®ÀÚ¿ ¸Þ½îµå, ±×¸®°í ¶Ç:
.encode([encoding=<±âº» ÄÚµåÀüȯ>][,errors="strict"]) --> À¯´ÏÄÚµå Ãâ·ÂÀ» ÂüÁ¶ .splitlines([include_breaks=0]) --> À¯´ÏÄÚµå ¹®ÀÚ¿À» (À¯´ÏÄÚµå)¶óÀÎÀÇ ¸®½ºÆ®·Î ÀÚ¸¥´Ù; ¸¸¾à, include_breaks°¡ ÂüÀ̶ó¸é ¶óÀÎ ºê·¹ÀÌÅ©°¡ Æ÷ÇÔµÈ ±× ¶óÀεéÀ» ¹ÝȯÇÑ´Ù. ¶óÀÎ ºê·¹ÀÌÅ©¸¦ ¼öÇàÇÏ´Â ¹æ¹ý¿¡ ´ëÇؼ´Â ÁÙ ³Ñ±è(Line Breaks)¸¦ ÂüÁ¶Ç϶ó. |
¿ì¸®´Â ÇÁ·¹µå¸¯ ·éÆ®(Fredrik Lundh)ÀÇ À¯´ÏÄÚµå °´Ã¼ ±¸ÇöÀ» Åä´ë·Î »ç¿ëÇØ¾ß ÇÕ´Ï´Ù. ±×ÀÇ ±¸ÇöÀº ÀÌ¹Ì ÇÊ¿äÇÑ ¹®ÀÚ¿ ¸Þ½îµåÀÇ »ó´ç¼ö¸¦ ±¸ÇöÇÏ°í ÀÖÀ¸¸ç Àß ÀÛ¼ºµÈ ÄÚµå ±âÃʸ¦ Á¦°øÇÏ¿© ÁÜÀ¸·Î½á ¿ì¸®´Â ±× À§¿¡ ±¸ÃàÇÒ ¼ö ÀÖ½À´Ï´Ù.
ÇÁ·¹µå¸¯ÀÇ ±¸Çö¿¡ ÀÖ´Â sharing °´Ã¼´Â »ý·«µÇ¾ß ÇÕ´Ï´Ù.
Å×½ºÆ® ÄÉÀ̽º´Â Lib/test/test_string.py¿¡ ÀÖ´Â ±ÔÄ¢µéÀ» ÁؼöÇØ¾ß Çϸç ÄÚµ¦ ·¹Áö½ºÆ®¸®¿Í Ç¥ÁØ ÄÚµ¦¿¡ ´ëÇÑ Ãß°¡ÀûÀÎ Á¡°ËÀ» Æ÷ÇÔÇØ¾ß ÇÕ´Ï´Ù.
À¯´ÏÄÚµå ÄܼÖƼ¾ö: http://www.unicode.org/ À¯´ÏÄÚµå Áú¹®°ú ´äº¯(FAQ): http://www.unicode.org/unicode/faq/ À¯´ÏÄÚµå 3.0: http://www.unicode.org/unicode/standard/versions/Unicode3.0.html À¯´ÏÄÚµå-±â¼ú º¸°í¼: http://www.unicode.org/unicode/reports/techreports.html À¯´ÏÄÚµå-¦Áþ±â: http://www.unicode.org/Public/MAPPINGS/ À¯´ÏÄÚµå °³·Ð (¾à°£ ¿À·¡µÇ¾úÁö¸¸ ¿©ÀüÈ÷ ÀÐÀ» ¸¸ÇÔ): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html ºñ±³¸¦ À§ÇØ: Introducing Unicode to ECMAScript (aka JavaScript) -- http://www-4.ibm.com/software/developer/library/internationalization-support.html IANA ¹®ÀÚ ¼¼Æ® À̸§µé: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets Æ÷½Ä½º¿Í ¸®´ª½º¿¡ ´ëÇÑ À¯´ÏÄÚµå¿Í UTF-8À» Áö¿øÇÏ´Â ¹®Á¦¿¡ ´ëÇÑ ³íÀÇ: http://www.cl.cam.ac.uk/~mgk25/unicode.html ÄÚµåÀüȯ: °³°ü: http://czyborra.com/utf/ UTC-2: http://www.uazone.com/multiling/unicode/ucs2.html UTF-7: RFC2152¿¡ Á¤ÀǵÊ, http://www.uazone.com/multiling/ml-docs/rfc2152.txt UTF-8: RFC2279¿¡ Á¤ÀǵÊ,. http://info.internet.isi.edu/in-notes/rfc/files/rfc2279.txt UTF-16: http://www.uazone.com/multiling/unicode/wg2n1035.html ÀÌ Á¦¾È¼ÀÇ º¯°æ±â·Ï: ------------------------- 1.8: Fixed some URLs to the unicode.org site. 1.7: Added note about the changed behaviour of "s#". 1.6: Changed <defencstr> to <defenc> since this is the name used in the implementation. Added notes about the usage of <defenc> in the buffer protocol implementation. 1.5: Added notes about setting the <±âº» ÄÚµåÀüȯ>. Fixed some typos (thanks to Andrew Kuchling). Changed <defencstr> to <utf8str>. 1.4: Added note about mixed type comparisons and contains tests. Changed treating of Unicode objects in format strings (if used with '%s' % u they will now cause the format string to be coerced to Unicode, thus producing a Unicode object on return). Added link to IANA charset names (thanks to Lars Marius Garshol). Added new codec methods .readline(), .readlines() and .writelines(). 1.3: Added new "es" and "es#" parser markers 1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and .write() method to match the standard file-like object methods (bytes consumed information is no longer returned by the methods) 1.0: changed encode Codec method to be symmetric to the decode method (they both return (object, data consumed) now and thus become interchangeable); removed __init__ method of Codec class (the methods are stateless) and moved the errors argument down to the methods; made the Codec design more generic w/r to type of input and output objects; changed StreamWriter.flush to StreamWriter.reset in order to avoid overriding the stream's .flush() method; renamed .breaklines() to .splitlines(); renamed the module unicodec to codecs; modified the File I/O section to refer to the stream codecs. 0.9: changed errors keyword argument definition; added 'replace' error handling; changed the codec APIs to accept buffer like objects on input; some minor typo fixes; added Whitespace section and included references for Unicode characters that have the whitespace and the line break characteristic; added note that search functions can expect lower-case encoding names; dropped slicing and offsets in the codec APIs 0.8: added encodings package and raw unicode escape encoding; untabified the proposal; added notes on Unicode format strings; added .breaklines() method 0.7: added a whole new set of codec APIs; added a different encoder lookup scheme; fixed some names 0.6: changed "s#" to "t#"; changed <defencbuf> to <defencstr> holding a real Python string object; changed Buffer Interface to delegate requests to <defencstr>'s buffer interface; removed the explicit reference to the unicodec.codecs dictionary (the module can implement this in way fit for the purpose); removed the settable default encoding; move UnicodeError from unicodec to exceptions; "s#" not returns the internal data; passed the UCS-2/UTF-16 checking from the Unicode constructor to the Codecs 0.5: moved sys.bom to unicodec.BOM; added sections on case mapping, private use encodings and Unicode character properties 0.4: added Codec interface, notes on %-formatting, changed some encoding details, added comments on stream wrappers, fixed some discussion points (most important: Internal Format), clarified the 'unicode-escape' encoding, added encoding references 0.3: added references, comments on codec modules, the internal format, bf_getcharbuffer and the RE engine; added 'unicode-escape' encoding proposed by Tim Peters and fixed repr(u) accordingly 0.2: integrated Guido's suggestions, added stream codecs and file wrapping 0.1: first version