nlpmed_engine.api package¶
Submodules¶
nlpmed_engine.api.main module¶
nlpmed_engine.api.mappers module¶
Mapping functions between Pydantic models and internal data structures.
This module provides a set of mapping functions to convert between Pydantic models used in the API layer and internal data structures used for processing within NLPMed-Engine. These mappers ensure data consistency and facilitate seamless transformation between API inputs/outputs and internal processing logic.
- Functions:
- map_pydantic_to_internal_sentence:
Maps a Pydantic SentenceModel to an internal Sentence object.
- map_internal_to_pydantic_sentence_model:
Maps an internal Sentence object to a Pydantic SentenceModel.
- map_pydantic_to_internal_section:
Maps a Pydantic SectionModel to an internal Section object.
- map_internal_to_pydantic_section_model:
Maps an internal Section object to a Pydantic SectionModel.
- map_pydantic_to_internal_note:
Maps a Pydantic NoteModel to an internal Note object.
- map_internal_to_pydantic_note_model:
Maps an internal Note object to a Pydantic NoteModel.
- map_pydantic_to_internal_patient:
Maps a Pydantic PatientModel to an internal Patient object.
- map_internal_to_pydantic_patient_model:
Maps an internal Patient object to a Pydantic PatientModel.
- nlpmed_engine.api.mappers.map_internal_to_pydantic_note_model(note: Note) NoteModel¶
Map an internal Note object to a Pydantic NoteModel.
- Args:
note (Note): The internal Note object to be converted.
- Returns:
NoteModel: The corresponding Pydantic model with mapped sections and attributes.
- nlpmed_engine.api.mappers.map_internal_to_pydantic_patient_model(patient: Patient) PatientModel¶
Map an internal Patient object to a Pydantic PatientModel.
- Args:
patient (Patient): The internal Patient object to be converted.
- Returns:
PatientModel: The corresponding Pydantic model with mapped notes.
- nlpmed_engine.api.mappers.map_internal_to_pydantic_section_model(section: Section) SectionModel¶
Map an internal Section object to a Pydantic SectionModel.
- Args:
section (Section): The internal Section object to be converted.
- Returns:
SectionModel: The corresponding Pydantic model with mapped sentences and attributes.
- nlpmed_engine.api.mappers.map_internal_to_pydantic_sentence_model(sentence: Sentence) SentenceModel¶
Map an internal Sentence object to a Pydantic SentenceModel.
- Args:
sentence (Sentence): The internal Sentence object to be converted.
- Returns:
SentenceModel: The corresponding Pydantic model with mapped attributes.
- nlpmed_engine.api.mappers.map_pydantic_to_internal_note(note_model: NoteModel) Note¶
Map a Pydantic NoteModel to an internal Note object.
- Args:
note_model (NoteModel): The Pydantic model representing a note.
- Returns:
Note: The internal Note object with mapped sections and attributes.
- nlpmed_engine.api.mappers.map_pydantic_to_internal_patient(patient_model: PatientModel) Patient¶
Map a Pydantic PatientModel to an internal Patient object.
- Args:
patient_model (PatientModel): The Pydantic model representing a patient.
- Returns:
Patient: The internal Patient object with mapped notes.
- nlpmed_engine.api.mappers.map_pydantic_to_internal_section(section_model: SectionModel) Section¶
Map a Pydantic SectionModel to an internal Section object.
- Args:
section_model (SectionModel): The Pydantic model representing a section.
- Returns:
Section: The internal Section object with mapped sentences and attributes.
- nlpmed_engine.api.mappers.map_pydantic_to_internal_sentence(sentence_model: SentenceModel) Sentence¶
Map a Pydantic SentenceModel to an internal Sentence object.
- Args:
sentence_model (SentenceModel): The Pydantic model representing a sentence.
- Returns:
Sentence: The internal Sentence object with the corresponding attributes.
nlpmed_engine.api.models module¶
Pydantic models for NLPMed-Engine API.
This module defines the Pydantic models used for validating and managing the input, output, and configuration data for the NLPMed-Engine. The models provide a structured representation of various components involved in text processing, including sentences, sections, notes, patients, and different text processing components.
- Classes:
- StringInputModel:
Model for string inputs to be processed.
- ComponentStatusModel:
Base model for component status.
- EncodingFixerModel:
Model for encoding fixer component status.
- PatternReplacerModel:
Model for pattern replacer component with pattern and target replacements.
- WordMaskerModel:
Model for word masker component with mask settings.
- NoteFilterModel:
Model for filtering notes based on keywords.
- SectionSplitterModel:
Model for splitting sections using a delimiter.
- SectionFilterModel:
Model for filtering sections based on include and exclude keywords.
- SentenceSegmenterModel:
Model for sentence segmentation settings.
- DuplicateCheckerModel:
Model for duplicate checking configuration.
- SentenceFilterModel:
Model for filtering sentences based on keywords.
- SentenceExpanderModel:
Model for expanding short sentences.
- JoinerModel:
Model for joining sentences and sections.
- MLInferenceModel:
Model for machine learning inference settings.
- ConfigModel:
Configuration model for all text processing components.
- SentenceModel:
Model for representing a sentence with various attributes.
- SectionModel:
Model for representing a section with sentences.
- NoteModel:
Model for representing a note with sections and preprocessed text.
- PatientModel:
Model for representing a patient with associated notes.
- TextProcessingResponseModel:
Model for responses related to text processing output.
- class nlpmed_engine.api.models.ComponentStatusModel(*, status: str)¶
Bases:
BaseModelBase model for component status.
- Attributes:
status (str): Status of the component (‘enabled’, ‘disabled’, ‘excluded’).
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: str¶
- class nlpmed_engine.api.models.ConfigModel(*, encoding_fixer: dict | None = None, pattern_replacer: dict | None = None, word_masker: dict | None = None, note_filter: dict | None = None, section_splitter: dict | None = None, section_filter: dict | None = None, sentence_segmenter: dict | None = None, duplicate_checker: dict | None = None, sentence_filter: dict | None = None, sentence_expander: dict | None = None, joiner: dict | None = None, ml_inference: dict | None = None, debug: bool = False)¶
Bases:
BaseModelConfiguration model for all text processing components.
- Attributes:
encoding_fixer (dict | None): Configuration for the encoding fixer component. pattern_replacer (dict | None): Configuration for the pattern replacer component. word_masker (dict | None): Configuration for the word masker component. note_filter (dict | None): Configuration for the note filter component. section_splitter (dict | None): Configuration for the section splitter component. section_filter (dict | None): Configuration for the section filter component. sentence_segmenter (dict | None): Configuration for the sentence segmenter component. duplicate_checker (dict | None): Configuration for the duplicate checker component. sentence_filter (dict | None): Configuration for the sentence filter component. sentence_expander (dict | None): Configuration for the sentence expander component. joiner (dict | None): Configuration for the joiner component. ml_inference (dict | None): Configuration for the machine learning inference component.
- debug: bool¶
- duplicate_checker: dict | None¶
- encoding_fixer: dict | None¶
- joiner: dict | None¶
- ml_inference: dict | None¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- note_filter: dict | None¶
- pattern_replacer: dict | None¶
- section_filter: dict | None¶
- section_splitter: dict | None¶
- sentence_expander: dict | None¶
- sentence_filter: dict | None¶
- sentence_segmenter: dict | None¶
- word_masker: dict | None¶
- class nlpmed_engine.api.models.DuplicateCheckerModel(*, status: str, num_perm: int = 256, sim_threshold: float = 0.9, length_threshold: int = 50)¶
Bases:
ComponentStatusModelModel for duplicate checking configuration.
- Attributes:
num_perm (int): Number of permutations for MinHash. sim_threshold (float): Similarity threshold for duplicates. length_threshold (int): Length threshold for checking duplicates.
- length_threshold: int¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_perm: int¶
- sim_threshold: float¶
- class nlpmed_engine.api.models.EncodingFixerModel(*, status: str)¶
Bases:
ComponentStatusModelModel for encoding fixer component status.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class nlpmed_engine.api.models.JoinerModel(*, status: str, sentence_delimiter: str = '\n', section_delimiter: str = '\n\n')¶
Bases:
ComponentStatusModelModel for joining sentences and sections.
- Attributes:
sentence_delimiter (str): Delimiter for joining sentences. section_delimiter (str): Delimiter for joining sections.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- section_delimiter: str¶
- sentence_delimiter: str¶
- class nlpmed_engine.api.models.MLInferenceModel(*, status: str, device: str = 'cpu', ml_model_path: str, ml_tokenizer_path: str)¶
Bases:
ComponentStatusModelModel for machine learning inference settings.
- Attributes:
device (str): Device used for model inference (e.g., ‘cpu’, ‘cuda’). ml_model_path (str): Path to the model. ml_tokenizer_path (str): Path to the tokenizer.
- device: str¶
- ml_model_path: str¶
- ml_tokenizer_path: str¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class nlpmed_engine.api.models.MlModelInfo(*, name: str, device: str, max_length: int, loaded: bool, loaded_at: str | None = None)¶
Bases:
BaseModel- device: str¶
- loaded: bool¶
- loaded_at: str | None¶
- max_length: int¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str¶
- class nlpmed_engine.api.models.MlModelsResponse(*, default_name: str, models: list[MlModelInfo])¶
Bases:
BaseModel- default_name: str¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- models: list[MlModelInfo]¶
- class nlpmed_engine.api.models.NoteFilterModel(*, status: str, words_to_search: list[str] = <factory>)¶
Bases:
ComponentStatusModelModel for filtering notes based on keywords.
- Attributes:
words_to_search (list[str]): Keywords to search in the notes.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- words_to_search: list[str]¶
- class nlpmed_engine.api.models.NoteModel(*, text: str, sections: list[~nlpmed_engine.api.models.SectionModel] = <factory>, preprocessed_text: str | None = None, predicted_label: str | None = None, predicted_score: float | None = None, note_id: str | None = None)¶
Bases:
BaseModelModel for representing a note with sections and preprocessed text.
- Attributes:
note_id (str): Unique identifier for the note. text (str): Text of the note. sections (list[SectionModel]): List of sections in the note. preprocessed_text (str | None): Preprocessed text of the note. predicted_label (str | None): Predicted label from the model inference. predicted_score (float | None): Predicted score from the model inference.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- note_id: str | None¶
- predicted_label: str | None¶
- predicted_score: float | None¶
- preprocessed_text: str | None¶
- sections: list[SectionModel]¶
- text: str¶
- class nlpmed_engine.api.models.PatientModel(*, patient_id: str, notes: list[NoteModel])¶
Bases:
BaseModelModel for representing a patient with associated notes.
- Attributes:
patient_id (str): Unique identifier for the patient. notes (list[NoteModel]): List of notes associated with the patient.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- patient_id: str¶
- class nlpmed_engine.api.models.PatternReplacerModel(*, status: str, pattern: str = '\\s{4,}', target: str = '\n\n')¶
Bases:
ComponentStatusModelModel for pattern replacer component with pattern and target replacements.
- Attributes:
pattern (str): Regex pattern to replace in the text. target (str): Target string to replace matched pattern.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pattern: str¶
- target: str¶
- class nlpmed_engine.api.models.SectionFilterModel(*, status: str, section_inc_list: list[str] = <factory>, section_exc_list: list[str] = <factory>, fallback: bool = False)¶
Bases:
ComponentStatusModelModel for filtering sections based on include and exclude keywords.
- Attributes:
section_inc_list (list[str]): Keywords for including sections. section_exc_list (list[str]): Keywords for excluding sections. fallback (bool): Enable fallback behavior if no sections match.
- fallback: bool¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- section_exc_list: list[str]¶
- section_inc_list: list[str]¶
- class nlpmed_engine.api.models.SectionModel(*, text: str, start_index: int, end_index: int, sentences: list[~nlpmed_engine.api.models.SentenceModel] = <factory>, important_indices: list[int] = <factory>, duplicate_indices: list[int] = <factory>, expanded_indices: list[int] = <factory>, is_important: bool = False)¶
Bases:
BaseModelModel for representing a section with sentences.
- Attributes:
text (str): Text of the section. start_index (int): Start index of the section in the original text. end_index (int): End index of the section in the original text. sentences (list[SentenceModel]): List of sentences in the section. important_indices (list[int]): Indices of important sentences in the section. duplicate_indices (list[int]): Indices of duplicate sentences in the section. is_important (bool): Indicates if the section is marked as important.
- duplicate_indices: list[int]¶
- end_index: int¶
- expanded_indices: list[int]¶
- important_indices: list[int]¶
- is_important: bool¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sentences: list[SentenceModel]¶
- start_index: int¶
- text: str¶
- class nlpmed_engine.api.models.SectionSplitterModel(*, status: str, delimiter: str = '\n\n')¶
Bases:
ComponentStatusModelModel for splitting sections using a delimiter.
- Attributes:
delimiter (str): Delimiter used to split sections.
- delimiter: str¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class nlpmed_engine.api.models.SentenceExpanderModel(*, status: str, length_threshold: int = 50)¶
Bases:
ComponentStatusModelModel for expanding short sentences.
- Attributes:
length_threshold (int): Threshold length for expanding short sentences.
- length_threshold: int¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class nlpmed_engine.api.models.SentenceFilterModel(*, status: str, words_to_search: list[str] = <factory>)¶
Bases:
ComponentStatusModelModel for filtering sentences based on keywords.
- Attributes:
words_to_search (list[str]): Keywords to filter sentences.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- words_to_search: list[str]¶
- class nlpmed_engine.api.models.SentenceModel(*, text: str, start_index: int, end_index: int, is_duplicate: bool = False, is_important: bool = False, is_expanded: bool = False)¶
Bases:
BaseModelModel for representing a sentence with various attributes.
- Attributes:
text (str): Text of the sentence. start_index (int): Start index of the sentence in the original text. end_index (int): End index of the sentence in the original text. is_duplicate (bool): Indicates if the sentence is marked as duplicate. is_important (bool): Indicates if the sentence is marked as important. is_expanded (bool): Indicates if the sentence has been expanded.
- end_index: int¶
- is_duplicate: bool¶
- is_expanded: bool¶
- is_important: bool¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- start_index: int¶
- text: str¶
- class nlpmed_engine.api.models.SentenceSegmenterModel(*, status: str, nlp_model_name: str = 'en_core_sci_lg', batch_size: int = 10)¶
Bases:
ComponentStatusModelModel for sentence segmentation settings.
- Attributes:
nlp_model_name (str): Name of the model used for sentence segmentation. batch_size (int): Batch size for processing.
- batch_size: int¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- nlp_model_name: str¶
- class nlpmed_engine.api.models.StringInputModel(*, text: str)¶
Bases:
BaseModelModel for string inputs to be processed.
- Attributes:
text (str): Text input to be processed.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- text: str¶
- class nlpmed_engine.api.models.TextProcessingResponseModel(*, preprocessed_text: str | None = None, predicted_label: str | None = None, predicted_score: float | None = None, note: NoteModel | None = None)¶
Bases:
BaseModelModel for responses related to text processing output.
- Attributes:
preprocessed_text (str | None): The preprocessed text output. predicted_label (str | None): The predicted label from the model inference. predicted_score (float | None): The prediction score associated with the predicted label. note (NoteModel | None): The note object returned in debug mode.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- predicted_label: str | None¶
- predicted_score: float | None¶
- preprocessed_text: str | None¶
- class nlpmed_engine.api.models.WordMaskerModel(*, status: str, words_to_mask: list[str] = <factory>, mask_char: str = '*')¶
Bases:
ComponentStatusModelModel for word masker component with mask settings.
- Attributes:
words_to_mask (list[str]): List of words to mask in the text. mask_char (str): Character used for masking.
- mask_char: str¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- words_to_mask: list[str]¶