Struct Wtf8
pub struct Wtf8 { /* private fields */ }Expand description
A borrowed slice of well-formed WTF-8 data.
Similar to &str, but can additionally contain surrogate code points
if they’re not in a surrogate pair.
Implementations§
§impl Wtf8
impl Wtf8
pub const fn from_str(value: &str) -> &Wtf8
pub const fn from_str(value: &str) -> &Wtf8
Create a WTF-8 slice from a UTF-8 &str slice.
Since WTF-8 is a superset of UTF-8, this always succeeds.
pub fn slice(&self, begin: usize, end: usize) -> &Wtf8
pub fn slice(&self, begin: usize, end: usize) -> &Wtf8
Return a slice of the given string for the byte range [begin..end).
§Failure
Fails when begin and end do not point to code point boundaries,
or point beyond the end of the string.
pub fn slice_from(&self, begin: usize) -> &Wtf8
pub fn slice_from(&self, begin: usize) -> &Wtf8
Return a slice of the given string from byte begin to its end.
§Failure
Fails when begin is not at a code point boundary,
or is beyond the end of the string.
pub fn slice_to(&self, end: usize) -> &Wtf8
pub fn slice_to(&self, end: usize) -> &Wtf8
Return a slice of the given string from its beginning to byte end.
§Failure
Fails when end is not at a code point boundary,
or is beyond the end of the string.
pub fn ascii_byte_at(&self, position: usize) -> u8
pub fn ascii_byte_at(&self, position: usize) -> u8
Return the code point at position if it is in the ASCII range,
or `b’\xFF’ otherwise.
§Failure
Fails if position is beyond the end of the string.
pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ
pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ
Return an iterator for the string’s code points.
pub fn contains_char(&self, ch: char) -> bool
pub fn contains_char(&self, ch: char) -> bool
Returns true if this WTF-8 string contains the given character.
pub fn contains(&self, code_point: CodePoint) -> bool
pub fn contains(&self, code_point: CodePoint) -> bool
Returns true if this WTF-8 string contains the given code point.
pub fn starts_with(&self, pattern: &str) -> bool
pub fn starts_with(&self, pattern: &str) -> bool
Returns true if this WTF-8 string starts with the given UTF-8 string.
pub fn as_str(&self) -> Option<&str>
pub fn as_str(&self) -> Option<&str>
Try to convert the string to UTF-8 and return a &str slice.
Return None if the string contains surrogates.
This does not copy the data.
pub fn to_string_lossy(&self) -> Cow<'_, str>
pub fn to_string_lossy(&self) -> Cow<'_, str>
Lossily convert the string to UTF-8.
Return an UTF-8 &str slice if the contents are well-formed in UTF-8.
Surrogates are replaced with "\u{FFFD}" (the replacement character
“�”).
This only copies the data if necessary (if it contains any surrogate).
pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ
pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ
Convert the WTF-8 string to potentially ill-formed UTF-16 and return an iterator of 16-bit code units.
This is lossless:
calling Wtf8Buf::from_ill_formed_utf16 on the resulting code units
would always return the original WTF-8 string.
pub fn to_uppercase(&self) -> Wtf8Buf
pub fn to_uppercase(&self) -> Wtf8Buf
Returns the uppercase equivalent of this wtf8 slice, as a new Wtf8Buf.
pub fn to_lowercase(&self) -> Wtf8Buf
pub fn to_lowercase(&self) -> Wtf8Buf
Returns the lowercase equivalent of this wtf8 slice, as a new Wtf8Buf.
pub fn from_bytes(bytes: &[u8]) -> Result<&Wtf8, &[u8]>
pub fn from_bytes(bytes: &[u8]) -> Result<&Wtf8, &[u8]>
Create a WTF-8 slice from a WTF-8 encoded byte slice.
Returns Ok(&Wtf8) if the bytes are well-formed WTF-8, or
Err(bytes) with the original byte slice if validation fails.
This validates that:
- All bytes form valid UTF-8 sequences OR valid surrogate code point encodings
- Surrogate code points may appear unpaired and be encoded separately,
but if they are paired, they must be encoded as a single 4-byte UTF-8
sequence. For example, the byte sequence
[0xED, 0xA0, 0x80, 0xED, 0xB0, 0x80]is not valid WTF-8 because WTF-8 forbids encoding a surrogate pair as two separate 3-byte sequences.
pub const unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Wtf8
pub const unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Wtf8
Create a WTF-8 slice from a WTF-8 encoded byte slice without checking that the bytes contain valid WTF-8.
For the safe version, see Wtf8::from_bytes.
§Safety
The bytes passed in must be valid WTF-8. See Wtf8::from_bytes for the requirements.
Trait Implementations§
§impl Debug for Wtf8
Format the slice with double quotes,
and surrogates as \u followed by four hexadecimal digits.
Example: "a\u{D800}" for a slice with code points [U+0061, U+D800]
impl Debug for Wtf8
Format the slice with double quotes,
and surrogates as \u followed by four hexadecimal digits.
Example: "a\u{D800}" for a slice with code points [U+0061, U+D800]