pub struct ICU4XWordSegmenter(/* private fields */);Expand description
An ICU4X word-break segmenter, capable of finding word breakpoints in strings.
Implementations§
Source§impl ICU4XWordSegmenter
impl ICU4XWordSegmenter
Sourcepub fn create_auto(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_auto( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter with automatically selecting the best available LSTM
or dictionary payload data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
Sourcepub fn create_lstm(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_lstm( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter with LSTM payload data for Burmese, Khmer, Lao, and
Thai.
Warning: ICU4XWordSegmenter created by this function doesn’t handle Chinese or
Japanese.
Sourcepub fn create_dictionary(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_dictionary( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai.
Sourcepub fn segment_utf8<'a>(
&'a self,
input: &'a DiplomatStr,
) -> Box<ICU4XWordBreakIteratorUtf8<'a>>
pub fn segment_utf8<'a>( &'a self, input: &'a DiplomatStr, ) -> Box<ICU4XWordBreakIteratorUtf8<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
Sourcepub fn segment_utf16<'a>(
&'a self,
input: &'a DiplomatStr16,
) -> Box<ICU4XWordBreakIteratorUtf16<'a>>
pub fn segment_utf16<'a>( &'a self, input: &'a DiplomatStr16, ) -> Box<ICU4XWordBreakIteratorUtf16<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
Sourcepub fn segment_latin1<'a>(
&'a self,
input: &'a [u8],
) -> Box<ICU4XWordBreakIteratorLatin1<'a>>
pub fn segment_latin1<'a>( &'a self, input: &'a [u8], ) -> Box<ICU4XWordBreakIteratorLatin1<'a>>
Segments a Latin-1 string.
Auto Trait Implementations§
impl Freeze for ICU4XWordSegmenter
impl RefUnwindSafe for ICU4XWordSegmenter
impl !Send for ICU4XWordSegmenter
impl !Sync for ICU4XWordSegmenter
impl Unpin for ICU4XWordSegmenter
impl UnwindSafe for ICU4XWordSegmenter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Filterable for T
impl<T> Filterable for T
Source§fn filterable(
self,
filter_name: &'static str,
) -> RequestFilterDataProvider<T, fn(DataRequest<'_>) -> bool>
fn filterable( self, filter_name: &'static str, ) -> RequestFilterDataProvider<T, fn(DataRequest<'_>) -> bool>
impl<T> ErasedDestructor for Twhere
T: 'static,
impl<T> MaybeSendSync for T
Layout§
Note: Most layout information is completely unstable and may even differ between compilations. The only exception is types with certain repr(...) attributes. Please see the Rust Reference's “Type Layout” chapter for details on type layout guarantees.
Size: 1752 bytes