Struct Wtf8Buf

pub struct Wtf8Buf { /* private fields */ }

Expand description

An owned, growable string of well-formed WTF-8 data.

Similar to String, but can additionally contain surrogate code points if they’re not in a surrogate pair.

Implementations§

§

impl Wtf8Buf

pub fn new() -> Wtf8Buf

Create an new, empty WTF-8 string.

pub fn with_capacity(n: usize) -> Wtf8Buf

Create an new, empty WTF-8 string with pre-allocated capacity for n bytes.

pub fn from_string(string: String) -> Wtf8Buf

Create a WTF-8 string from an UTF-8 String.

This takes ownership of the String and does not copy.

Since WTF-8 is a superset of UTF-8, this always succeeds.

pub fn from_str(s: &str) -> Wtf8Buf

Create a WTF-8 string from an UTF-8 &str slice.

This copies the content of the slice.

Since WTF-8 is a superset of UTF-8, this always succeeds.

pub fn from_ill_formed_utf16(v: &[u16]) -> Wtf8Buf

Create a WTF-8 string from a potentially ill-formed UTF-16 slice of 16-bit code units.

This is lossless: calling .to_ill_formed_utf16() on the resulting string will always return the original code units.

pub fn reserve(&mut self, additional: usize)

Reserves capacity for at least additional more bytes to be inserted in the given Wtf8Buf. The collection may reserve more space to avoid frequent reallocations.

§Panics

Panics if the new capacity overflows usize.

pub fn capacity(&self) -> usize

Returns the number of bytes that this string buffer can hold without reallocating.

pub fn push_str(&mut self, other: &str)

Append an UTF-8 slice at the end of the string.

pub fn push_wtf8(&mut self, other: &Wtf8)

Append a WTF-8 slice at the end of the string.

This replaces newly paired surrogates at the boundary with a supplementary code point, like concatenating ill-formed UTF-16 strings effectively would.

pub fn push_char(&mut self, c: char)

Append a Unicode scalar value at the end of the string.

pub fn push(&mut self, code_point: CodePoint)

Append a code point at the end of the string.

This replaces newly paired surrogates at the boundary with a supplementary code point, like concatenating ill-formed UTF-16 strings effectively would.

pub fn truncate(&mut self, new_len: usize)

Shortens a string to the specified length.

§Failure

Fails if new_len > current length, or if new_len is not a code point boundary.

pub fn clear(&mut self)

Clear the WTF-8 vector, removing all contents.

pub fn into_string(self) -> Result<String, Wtf8Buf>

Consume the WTF-8 string and try to convert it to UTF-8.

This does not copy the data.

If the contents are not well-formed UTF-8 (that is, if the string contains surrogates), the original WTF-8 string is returned instead.

pub fn into_string_lossy(self) -> String

Consume the WTF-8 string and convert it lossily to UTF-8.

This does not copy the data (but may overwrite parts of it in place).

Surrogates are replaced with "\u{FFFD}" (the replacement character “�”)

pub fn from_bytes(bytes: Vec<u8>) -> Result<Wtf8Buf, Vec<u8>>

Create a Wtf8Buf from a WTF-8 encoded byte vector.

Returns Ok(Wtf8Buf) if the bytes are well-formed WTF-8, or Err(bytes) with the original bytes if validation fails.

This validates that:

All bytes form valid UTF-8 sequences OR valid surrogate code point encodings
Surrogate code points may appear unpaired and be encoded separately, but if they are paired, they must be encoded as a single 4-byte UTF-8 sequence. For example, the byte sequence [0xED, 0xA0, 0x80, 0xED, 0xB0, 0x80] is not valid WTF-8 because WTF-8 forbids encoding a surrogate pair as two separate 3-byte sequences.

pub unsafe fn from_bytes_unchecked(bytes: Vec<u8>) -> Wtf8Buf

Create a Wtf8Buf from a WTF-8 encoded byte vector without checking that the bytes contain valid WTF-8.

For the safe version, see Wtf8Buf::from_bytes.

§Safety

The bytes passed in must be valid WTF-8. See Wtf8Buf::from_bytes for the requirements.

Methods from Deref<Target = Wtf8>§

pub fn len(&self) -> usize

Return the length, in WTF-8 bytes.

pub fn is_empty(&self) -> bool

Return true if the string has a length of zero bytes.

pub fn is_ascii(&self) -> bool

Return true if the string contains only ASCII characters.

pub fn slice(&self, begin: usize, end: usize) -> &Wtf8

Return a slice of the given string for the byte range [begin..end).

§Failure

Fails when begin and end do not point to code point boundaries, or point beyond the end of the string.

pub fn slice_from(&self, begin: usize) -> &Wtf8

Return a slice of the given string from byte begin to its end.

§Failure

Fails when begin is not at a code point boundary, or is beyond the end of the string.

pub fn slice_to(&self, end: usize) -> &Wtf8

Return a slice of the given string from its beginning to byte end.

§Failure

Fails when end is not at a code point boundary, or is beyond the end of the string.

pub fn ascii_byte_at(&self, position: usize) -> u8

Return the code point at position if it is in the ASCII range, or `b’\xFF’ otherwise.

§Failure

Fails if position is beyond the end of the string.

pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ

Return an iterator for the string’s code points.

pub fn contains_char(&self, ch: char) -> bool

Returns true if this WTF-8 string contains the given character.

pub fn contains(&self, code_point: CodePoint) -> bool

Returns true if this WTF-8 string contains the given code point.

pub fn starts_with(&self, pattern: &str) -> bool

Returns true if this WTF-8 string starts with the given UTF-8 string.

pub fn as_str(&self) -> Option<&str>

Try to convert the string to UTF-8 and return a &str slice.

Return None if the string contains surrogates.

This does not copy the data.

pub fn as_bytes(&self) -> &[u8] ⓘ

Return the underlying WTF-8 bytes.

pub fn to_string_lossy(&self) -> Cow<'_, str>

Lossily convert the string to UTF-8. Return an UTF-8 &str slice if the contents are well-formed in UTF-8.

Surrogates are replaced with "\u{FFFD}" (the replacement character “�”).

This only copies the data if necessary (if it contains any surrogate).

pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ

Convert the WTF-8 string to potentially ill-formed UTF-16 and return an iterator of 16-bit code units.

This is lossless: calling Wtf8Buf::from_ill_formed_utf16 on the resulting code units would always return the original WTF-8 string.

pub fn to_uppercase(&self) -> Wtf8Buf

Returns the uppercase equivalent of this wtf8 slice, as a new Wtf8Buf.

pub fn to_lowercase(&self) -> Wtf8Buf

Returns the lowercase equivalent of this wtf8 slice, as a new Wtf8Buf.

Trait Implementations§

§

impl Add<&Wtf8> for Wtf8Buf

§

type Output = Wtf8Buf

The resulting type after applying the + operator.

§

fn add(self, rhs: &Wtf8) -> <Wtf8Buf as Add<&Wtf8>>::Output

Performs the + operation. Read more

§

impl Borrow<Wtf8> for Wtf8Buf

§

fn borrow(&self) -> &Wtf8

Immutably borrows from an owned value. Read more

§

impl Clone for Wtf8Buf

§

fn clone(&self) -> Wtf8Buf

Returns a copy of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

§

impl Debug for Wtf8Buf

Format the string with double quotes, and surrogates as \u followed by four hexadecimal digits. Example: "a\u{D800}" for a string with code points [U+0061, U+D800]

§

fn fmt(&self, formatter: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

§

impl Default for Wtf8Buf

§

fn default() -> Wtf8Buf

Returns the “default value” for a type. Read more

§

impl Deref for Wtf8Buf

§

type Target = Wtf8

The resulting type after dereferencing.

§

fn deref(&self) -> &Wtf8

Dereferences the value.

§

impl Extend<CodePoint> for Wtf8Buf

Append code points from an iterator to the string.

This replaces surrogate code point pairs with supplementary code points, like concatenating ill-formed UTF-16 strings effectively would.

§

fn extend<T>(&mut self, iterable: T)
where T: IntoIterator<Item = CodePoint>,

Extends a collection with the contents of an iterator. Read more

Struct Wtf8BufCopy item path

Implementations§

impl Wtf8Buf

pub fn new() -> Wtf8Buf

pub fn with_capacity(n: usize) -> Wtf8Buf

pub fn from_string(string: String) -> Wtf8Buf

pub fn from_str(s: &str) -> Wtf8Buf

pub fn from_ill_formed_utf16(v: &[u16]) -> Wtf8Buf

pub fn reserve(&mut self, additional: usize)

§Panics

pub fn capacity(&self) -> usize

pub fn push_str(&mut self, other: &str)

pub fn push_wtf8(&mut self, other: &Wtf8)

pub fn push_char(&mut self, c: char)

pub fn push(&mut self, code_point: CodePoint)

pub fn truncate(&mut self, new_len: usize)

§Failure

pub fn clear(&mut self)

pub fn into_string(self) -> Result<String, Wtf8Buf>

pub fn into_string_lossy(self) -> String

pub fn from_bytes(bytes: Vec<u8>) -> Result<Wtf8Buf, Vec<u8>>

pub unsafe fn from_bytes_unchecked(bytes: Vec<u8>) -> Wtf8Buf

§Safety

Methods from Deref<Target = Wtf8>§

pub fn len(&self) -> usize

pub fn is_empty(&self) -> bool

pub fn is_ascii(&self) -> bool

pub fn slice(&self, begin: usize, end: usize) -> &Wtf8

§Failure

pub fn slice_from(&self, begin: usize) -> &Wtf8

§Failure

pub fn slice_to(&self, end: usize) -> &Wtf8

§Failure

pub fn ascii_byte_at(&self, position: usize) -> u8

§Failure

pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ

pub fn contains_char(&self, ch: char) -> bool

pub fn contains(&self, code_point: CodePoint) -> bool

pub fn starts_with(&self, pattern: &str) -> bool

pub fn as_str(&self) -> Option<&str>

pub fn as_bytes(&self) -> &[u8] ⓘ

pub fn to_string_lossy(&self) -> Cow<'_, str>

pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ

pub fn to_uppercase(&self) -> Wtf8Buf

pub fn to_lowercase(&self) -> Wtf8Buf

Trait Implementations§

impl Add<&Wtf8> for Wtf8Buf

type Output = Wtf8Buf

fn add(self, rhs: &Wtf8) -> <Wtf8Buf as Add<&Wtf8>>::Output

impl Borrow<Wtf8> for Wtf8Buf

fn borrow(&self) -> &Wtf8

impl Clone for Wtf8Buf

fn clone(&self) -> Wtf8Buf

fn clone_from(&mut self, source: &Self)

impl Debug for Wtf8Buf

fn fmt(&self, formatter: &mut Formatter<'_>) -> Result<(), Error>

impl Default for Wtf8Buf

fn default() -> Wtf8Buf

impl Deref for Wtf8Buf

type Target = Wtf8

fn deref(&self) -> &Wtf8

impl Extend<CodePoint> for Wtf8Buf

fn extend<T>(&mut self, iterable: T)where T: IntoIterator<Item = CodePoint>,

fn extend_one(&mut self, item: A)

fn extend_reserve(&mut self, additional: usize)

impl From<&Wtf8Atom> for Wtf8Buf

fn from(s: &Wtf8Atom) -> Self

impl FromIterator<CodePoint> for Wtf8Buf

fn from_iter<T>(iterable: T) -> Wtf8Bufwhere T: IntoIterator<Item = CodePoint>,

impl FromStr for Wtf8Buf

type Err = Infallible

fn from_str(s: &str) -> Result<Wtf8Buf, <Wtf8Buf as FromStr>::Err>

impl Hash for Wtf8Buf

fn hash<H>(&self, state: &mut H)where H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl Ord for Wtf8Buf

fn cmp(&self, other: &Wtf8Buf) -> Ordering

fn max(self, other: Self) -> Selfwhere Self: Sized,

fn min(self, other: Self) -> Selfwhere Self: Sized,

fn clamp(self, min: Self, max: Self) -> Selfwhere Self: Sized,

Struct Wtf8Buf

fn extend<T>(&mut self, iterable: T)
where T: IntoIterator<Item = CodePoint>,

fn from_iter<T>(iterable: T) -> Wtf8Buf
where T: IntoIterator<Item = CodePoint>,

fn hash<H>(&self, state: &mut H)
where H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

fn max(self, other: Self) -> Self
where Self: Sized,

fn min(self, other: Self) -> Self
where Self: Sized,

fn clamp(self, min: Self, max: Self) -> Self
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<Q, K> Comparable<K> for Q
where Q: Ord + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,