Struct Wtf8

pub struct Wtf8 { /* private fields */ }

Expand description

A borrowed slice of well-formed WTF-8 data.

Similar to &str, but can additionally contain surrogate code points if they’re not in a surrogate pair.

Implementations§

§

impl Wtf8

pub const fn from_str(value: &str) -> &Wtf8

Create a WTF-8 slice from a UTF-8 &str slice.

Since WTF-8 is a superset of UTF-8, this always succeeds.

pub const fn len(&self) -> usize

Return the length, in WTF-8 bytes.

pub const fn is_empty(&self) -> bool

Return true if the string has a length of zero bytes.

pub const fn is_ascii(&self) -> bool

Return true if the string contains only ASCII characters.

pub fn slice(&self, begin: usize, end: usize) -> &Wtf8

Return a slice of the given string for the byte range [begin..end).

§Failure

Fails when begin and end do not point to code point boundaries, or point beyond the end of the string.

pub fn slice_from(&self, begin: usize) -> &Wtf8

Return a slice of the given string from byte begin to its end.

§Failure

Fails when begin is not at a code point boundary, or is beyond the end of the string.

pub fn slice_to(&self, end: usize) -> &Wtf8

Return a slice of the given string from its beginning to byte end.

§Failure

Fails when end is not at a code point boundary, or is beyond the end of the string.

pub fn ascii_byte_at(&self, position: usize) -> u8

Return the code point at position if it is in the ASCII range, or `b’\xFF’ otherwise.

§Failure

Fails if position is beyond the end of the string.

pub fn code_points(&self) -> Wtf8CodePoints<'_> ⓘ

Return an iterator for the string’s code points.

pub fn contains_char(&self, ch: char) -> bool

Returns true if this WTF-8 string contains the given character.

pub fn contains(&self, code_point: CodePoint) -> bool

Returns true if this WTF-8 string contains the given code point.

pub fn starts_with(&self, pattern: &str) -> bool

Returns true if this WTF-8 string starts with the given UTF-8 string.

pub fn as_str(&self) -> Option<&str>

Try to convert the string to UTF-8 and return a &str slice.

Return None if the string contains surrogates.

This does not copy the data.

pub const fn as_bytes(&self) -> &[u8] ⓘ

Return the underlying WTF-8 bytes.

pub fn to_string_lossy(&self) -> Cow<'_, str>

Lossily convert the string to UTF-8. Return an UTF-8 &str slice if the contents are well-formed in UTF-8.

Surrogates are replaced with "\u{FFFD}" (the replacement character “�”).

This only copies the data if necessary (if it contains any surrogate).

pub fn to_ill_formed_utf16(&self) -> IllFormedUtf16CodeUnits<'_> ⓘ

Convert the WTF-8 string to potentially ill-formed UTF-16 and return an iterator of 16-bit code units.

This is lossless: calling Wtf8Buf::from_ill_formed_utf16 on the resulting code units would always return the original WTF-8 string.

pub fn to_uppercase(&self) -> Wtf8Buf

Returns the uppercase equivalent of this wtf8 slice, as a new Wtf8Buf.

pub fn to_lowercase(&self) -> Wtf8Buf

Returns the lowercase equivalent of this wtf8 slice, as a new Wtf8Buf.

pub fn from_bytes(bytes: &[u8]) -> Result<&Wtf8, &[u8]>

Create a WTF-8 slice from a WTF-8 encoded byte slice.

Returns Ok(&Wtf8) if the bytes are well-formed WTF-8, or Err(bytes) with the original byte slice if validation fails.

This validates that:

All bytes form valid UTF-8 sequences OR valid surrogate code point encodings
Surrogate code points may appear unpaired and be encoded separately, but if they are paired, they must be encoded as a single 4-byte UTF-8 sequence. For example, the byte sequence [0xED, 0xA0, 0x80, 0xED, 0xB0, 0x80] is not valid WTF-8 because WTF-8 forbids encoding a surrogate pair as two separate 3-byte sequences.

pub const unsafe fn from_bytes_unchecked(bytes: &[u8]) -> &Wtf8

Create a WTF-8 slice from a WTF-8 encoded byte slice without checking that the bytes contain valid WTF-8.

For the safe version, see Wtf8::from_bytes.

§Safety

The bytes passed in must be valid WTF-8. See Wtf8::from_bytes for the requirements.

Trait Implementations§

§

impl Add<&Wtf8> for Wtf8Buf

§

type Output = Wtf8Buf

The resulting type after applying the + operator.

§

fn add(self, rhs: &Wtf8) -> <Wtf8Buf as Add<&Wtf8>>::Output

Performs the + operation. Read more

§

impl Borrow<Wtf8> for Wtf8Buf

§

fn borrow(&self) -> &Wtf8

Immutably borrows from an owned value. Read more

§

impl Debug for Wtf8

Format the slice with double quotes, and surrogates as \u followed by four hexadecimal digits. Example: "a\u{D800}" for a slice with code points [U+0061, U+D800]

§