pub struct Wtf8Buf { /* private fields */ }Expand description
An owned, growable string of well-formed WTF-8 data.
Similar to String, but can additionally contain surrogate code points
if they’re not in a surrogate pair.
Implementations§
Source§impl Wtf8Buf
impl Wtf8Buf
Sourcepub fn with_capacity(n: usize) -> Wtf8Buf
pub fn with_capacity(n: usize) -> Wtf8Buf
Create an new, empty WTF-8 string with pre-allocated capacity for n
bytes.
Sourcepub fn from_string(string: String) -> Wtf8Buf
pub fn from_string(string: String) -> Wtf8Buf
Create a WTF-8 string from an UTF-8 String.
This takes ownership of the String and does not copy.
Since WTF-8 is a superset of UTF-8, this always succeeds.
Sourcepub fn from_str(s: &str) -> Wtf8Buf
pub fn from_str(s: &str) -> Wtf8Buf
Create a WTF-8 string from an UTF-8 &str slice.
This copies the content of the slice.
Since WTF-8 is a superset of UTF-8, this always succeeds.
Sourcepub fn from_ill_formed_utf16(v: &[u16]) -> Wtf8Buf
pub fn from_ill_formed_utf16(v: &[u16]) -> Wtf8Buf
Create a WTF-8 string from a potentially ill-formed UTF-16 slice of 16-bit code units.
This is lossless: calling .to_ill_formed_utf16() on the resulting
string will always return the original code units.
Sourcepub fn reserve(&mut self, additional: usize)
pub fn reserve(&mut self, additional: usize)
Reserves capacity for at least additional more bytes to be inserted
in the given Wtf8Buf.
The collection may reserve more space to avoid frequent reallocations.
§Panics
Panics if the new capacity overflows usize.
Sourcepub fn capacity(&self) -> usize
pub fn capacity(&self) -> usize
Returns the number of bytes that this string buffer can hold without reallocating.
Sourcepub fn push_wtf8(&mut self, other: &Wtf8)
pub fn push_wtf8(&mut self, other: &Wtf8)
Append a WTF-8 slice at the end of the string.
This replaces newly paired surrogates at the boundary with a supplementary code point, like concatenating ill-formed UTF-16 strings effectively would.
Sourcepub fn push(&mut self, code_point: CodePoint)
pub fn push(&mut self, code_point: CodePoint)
Append a code point at the end of the string.
This replaces newly paired surrogates at the boundary with a supplementary code point, like concatenating ill-formed UTF-16 strings effectively would.
Sourcepub fn truncate(&mut self, new_len: usize)
pub fn truncate(&mut self, new_len: usize)
Shortens a string to the specified length.
§Failure
Fails if new_len > current length,
or if new_len is not a code point boundary.
Sourcepub fn into_string(self) -> Result<String, Wtf8Buf>
pub fn into_string(self) -> Result<String, Wtf8Buf>
Consume the WTF-8 string and try to convert it to UTF-8.
This does not copy the data.
If the contents are not well-formed UTF-8 (that is, if the string contains surrogates), the original WTF-8 string is returned instead.
Sourcepub fn into_string_lossy(self) -> String
pub fn into_string_lossy(self) -> String
Consume the WTF-8 string and convert it lossily to UTF-8.
This does not copy the data (but may overwrite parts of it in place).
Surrogates are replaced with "\u{FFFD}" (the replacement character
“�”)
Sourcepub fn from_bytes(bytes: Vec<u8>) -> Result<Self, Vec<u8>>
pub fn from_bytes(bytes: Vec<u8>) -> Result<Self, Vec<u8>>
Create a Wtf8Buf from a WTF-8 encoded byte vector.
Returns Ok(Wtf8Buf) if the bytes are well-formed WTF-8, or
Err(bytes) with the original bytes if validation fails.
This validates that:
- All bytes form valid UTF-8 sequences OR valid surrogate code point encodings
- Surrogate code points may appear unpaired and be encoded separately,
but if they are paired, they must be encoded as a single 4-byte UTF-8
sequence. For example, the byte sequence
[0xED, 0xA0, 0x80, 0xED, 0xB0, 0x80]is not valid WTF-8 because WTF-8 forbids encoding a surrogate pair as two separate 3-byte sequences.
Sourcepub unsafe fn from_bytes_unchecked(bytes: Vec<u8>) -> Self
pub unsafe fn from_bytes_unchecked(bytes: Vec<u8>) -> Self
Create a Wtf8Buf from a WTF-8 encoded byte vector without checking that the bytes contain valid WTF-8.
For the safe version, see Wtf8Buf::from_bytes.
§Safety
The bytes passed in must be valid WTF-8. See Wtf8Buf::from_bytes for the requirements.
Methods from Deref<Target = Wtf8>§
Sourcepub fn slice(&self, begin: usize, end: usize) -> &Wtf8
pub fn slice(&self, begin: usize, end: usize) -> &Wtf8
Return a slice of the given string for the byte range [begin..end).
§Failure
Fails when begin and end do not point to code point boundaries,
or point beyond the end of the string.
Sourcepub fn slice_from(&self, begin: usize) -> &Wtf8
pub fn slice_from(&self, begin: usize) -> &Wtf8
Return a slice of the given string from byte begin to its end.
§Failure
Fails when begin is not at a code point boundary,
or is beyond the end of the string.
Sourcepub fn slice_to(&self, end: usize) -> &Wtf8
pub fn slice_to(&self, end: usize) -> &Wtf8
Return a slice of the given string from its beginning to byte end.
§Failure
Fails when end is not at a code point boundary,
or is beyond the end of the string.
Sourcepub fn ascii_byte_at(&self, position: usize) -> u8
pub fn ascii_byte_at(&self, position: usize) -> u8
Return the code point at position if it is in the ASCII range,
or `b’\xFF’ otherwise.
§Failure
Fails if position is beyond the end of the string.
Sourcepub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ
pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ
Return an iterator for the string’s code points.
Sourcepub fn contains_char(&self, ch: char) -> bool
pub fn contains_char(&self, ch: char) -> bool
Returns true if this WTF-8 string contains the given character.
Sourcepub fn contains(&self, code_point: CodePoint) -> bool
pub fn contains(&self, code_point: CodePoint) -> bool
Returns true if this WTF-8 string contains the given code point.
Sourcepub fn starts_with(&self, pattern: &str) -> bool
pub fn starts_with(&self, pattern: &str) -> bool
Returns true if this WTF-8 string starts with the given UTF-8 string.
Sourcepub fn as_str(&self) -> Option<&str>
pub fn as_str(&self) -> Option<&str>
Try to convert the string to UTF-8 and return a &str slice.
Return None if the string contains surrogates.
This does not copy the data.
Sourcepub fn to_string_lossy(&self) -> Cow<'_, str>
pub fn to_string_lossy(&self) -> Cow<'_, str>
Lossily convert the string to UTF-8.
Return an UTF-8 &str slice if the contents are well-formed in UTF-8.
Surrogates are replaced with "\u{FFFD}" (the replacement character
“�”).
This only copies the data if necessary (if it contains any surrogate).
Sourcepub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ
pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ
Convert the WTF-8 string to potentially ill-formed UTF-16 and return an iterator of 16-bit code units.
This is lossless:
calling Wtf8Buf::from_ill_formed_utf16 on the resulting code units
would always return the original WTF-8 string.
Sourcepub fn to_uppercase(&self) -> Wtf8Buf
pub fn to_uppercase(&self) -> Wtf8Buf
Returns the uppercase equivalent of this wtf8 slice, as a new Wtf8Buf.
Sourcepub fn to_lowercase(&self) -> Wtf8Buf
pub fn to_lowercase(&self) -> Wtf8Buf
Returns the lowercase equivalent of this wtf8 slice, as a new Wtf8Buf.
Trait Implementations§
Source§impl Debug for Wtf8Buf
Format the string with double quotes,
and surrogates as \u followed by four hexadecimal digits.
Example: "a\u{D800}" for a string with code points [U+0061, U+D800]
impl Debug for Wtf8Buf
Format the string with double quotes,
and surrogates as \u followed by four hexadecimal digits.
Example: "a\u{D800}" for a string with code points [U+0061, U+D800]
Source§impl Extend<CodePoint> for Wtf8Buf
Append code points from an iterator to the string.
impl Extend<CodePoint> for Wtf8Buf
Append code points from an iterator to the string.
This replaces surrogate code point pairs with supplementary code points, like concatenating ill-formed UTF-16 strings effectively would.
Source§fn extend<T: IntoIterator<Item = CodePoint>>(&mut self, iterable: T)
fn extend<T: IntoIterator<Item = CodePoint>>(&mut self, iterable: T)
Source§fn extend_one(&mut self, item: A)
fn extend_one(&mut self, item: A)
extend_one)Source§fn extend_reserve(&mut self, additional: usize)
fn extend_reserve(&mut self, additional: usize)
extend_one)Source§impl FromIterator<CodePoint> for Wtf8Buf
Create a new WTF-8 string from an iterator of code points.
impl FromIterator<CodePoint> for Wtf8Buf
Create a new WTF-8 string from an iterator of code points.
This replaces surrogate code point pairs with supplementary code points, like concatenating ill-formed UTF-16 strings effectively would.