使用默认编码

为了更好地理解这一点，让我们先看看文本解码的默认行为...

import httpx

# 使用默认配置实例化客户端
client = httpx.Client()

# 使用客户端...
response = client.get(...)
print(response.encoding)  # 这里会打印 Content-Type 中指定的 charset，
                          # 如果没有则打印 "utf-8"。
print(response.text)  # 文本会使用 Content-Type 指定的 charset 解码，
                      # 如果没有则使用 "utf-8"。

这通常完全没问题。大多数服务器会返回正确格式的 Content-Type 头，包含字符集编码。而在大多数没有指定字符集编码的情况下，UTF-8 很可能是使用的编码，因为它被广泛采用。

使用显式编码

在某些情况下，我们可能向某个网站发起请求，该网站的服务器没有显式设置字符集信息，但我们知道使用的编码。这种情况下，最好在客户端上显式设置默认编码。

import httpx

# 实例化客户端，将日文字符集设为默认编码
client = httpx.Client(default_encoding="shift-jis")

使用客户端...

response = client.get(...) print(response.encoding) # 这将输出 Content-Type 中指定的 charset， # 如果没有则输出 "shift-jis"。 print(response.text) # 文本将使用 Content-Type 指定的 charset 解码， # 如果没有则使用 "shift-jis"。

## 使用自动检测

当服务器没有可靠地包含字符集信息，并且我们不知道使用何种编码时，可以启用自动检测功能，在从字节解码为文本时进行最佳猜测。

要使用自动检测，需要将 `default_encoding` 参数设置为一个可调用对象而非字符串。这个可调用对象应该是一个函数，它接收输入字节作为参数，并返回用于将这些字节解码为文本的字符集。

有两个广泛使用的 Python 包可以处理此功能：

* [`chardet`](https://chardet.readthedocs.io/) - 这是一个成熟的包，是 [Mozilla 自动检测代码](https://www-archive.mozilla.org/projects/intl/chardet.html) 的移植版本。
* [`charset-normalizer`](https://charset-normalizer.readthedocs.io/) - 一个较新的包，受 `chardet` 启发，但采用了不同的方法。

让我们看看如何使用其中一个包来安装自动检测功能...

```shell
$ pip install httpx
$ pip install chardet

安装 chardet 后，我们可以配置客户端使用字符集自动检测。

import httpx
import chardet

def autodetect(content):
    return chardet.detect(content).get("encoding")


# 使用启用了字符集自动检测的客户端。
client = httpx.Client(default_encoding=autodetect)
response = client.get(...)
print(response.encoding)  # 这将输出 Content-Type 中指定的 charset，
                          # 如果没有则输出自动检测到的字符集。
print(response.text)