lingpy.read package

Submodules

lingpy.read.csv module

Module provides functions for reading csv-files.

lingpy.read.csv.csv2dict(filename, fileformat=None, dtype=None, comment='#', sep='\t', strip_lines=True, header=False)

Very simple function to get quick access to CSV-files.

Parameters:

filename : str

Name of the input file.

fileformat : {None str}

If not specified the file <filename> will be loaded. Otherwise, the fileformat is interpreted as the specific extension of the input file.

dtype : {None list}

If not specified, all data will be loaded as strings. Otherwise, a list specifying the data for each line should be provided.

comment : string (default=”#”)

Comment character in the begin of a line forces this line to be ignored.

sep : string (default = “ “)

Specify the separator for the CSV-file.

strip_lines : bool (default=True)

Specify whether empty “cells” in the input file should be preserved. If set to c{False}, each line will be stripped first, and all whitespace will be cleaned. Otherwise, each line will be separated using the specified separator, and no stripping of whitespace will be carried out.

header : bool (default=False)

Indicate, whether the data comes along with a header.

Returns:

d : dict

A dictionary-representation of the CSV file, with the first row being used as key and the rest of the rows as values.

lingpy.read.csv.csv2list(filename, fileformat='', dtype=None, comment='#', sep='\t', strip_lines=True, header=False)

Very simple function to get quick (and somewhat naive) access to CSV-files.

Parameters:

filename : str

Name of the input file.

fileformat : {None str}

If not specified the file <filename> will be loaded. Otherwise, the fileformat is interpreted as the specific extension of the input file.

dtype : {list}

If not specified, all data will be loaded as strings. Otherwise, a list specifying the data for each line should be provided.

comment : string (default=”#”)

Comment character in the begin of a line forces this line to be ignored (set to None if you want to parse all lines of your file).

sep : string (default = “ “)

Specify the separator for the CSV-file.

strip_lines : bool (default=True)

Specify whether empty “cells” in the input file should be preserved. If set to c{False}, each line will be stripped first, and all whitespace will be cleaned. Otherwise, each line will be separated using the specified separator, and no stripping of whitespace will be carried out.

header : bool (default=False)

Indicate, whether the data comes along with a header.

Returns:

l : list

A list-representation of the CSV file.

lingpy.read.csv.csv2multidict(filename, comment='#', sep='\t')

Function reads a csv-file into a multi-dimensional dictionary structure.

lingpy.read.csv.read_asjp(infile, family='Indo-European', classification='hh', max_synonyms=2, min_population=<function <lambda>>, merge_vowels=True, evaluate=False)

lingpy.read.phylip module

Module provides functions to read in various formats from the Phylip package.

lingpy.read.phylip.read_dst(filename, taxlen=10, comment='#')

Function reads files in Phylip dst-format.

Parameters:

filename : string

Name of the file which should have the extension dst.

taxlen : int (default=10)

Indicate how long the taxon names are allowed to be in the file from which you want to read. The Phylip package only allows taxon names consisting of maximally 10 characters (this is the default). Other packages, however, allow more. If Phylip compatibility is not important for you and you just want to allow for as long taxon names as possible, set this value to 0 and make sure to use tabstops as separators between values in your matrix file.

comment : str (default = ‘#’)

The comment character to be used if your file contains additional information which should be ignored.

Returns:

data : tuple

A tuple consisting of a list of taxa and a matrix.

lingpy.read.phylip.read_scorer(infile)

Read a scoring function in a file into a ScoreDict object.

Parameters:

infile : str

The path to the input file that shall be read as a scoring dictionary. The matrix format is a simple csv-file in which the scoring matrix is displayed, with negative values indicating high differences between sound segments (or sound classes) and positive values indicating high similarity. The matrix should be symmetric, columns should be separated by tabstops, and the first column should provide the alphabet for which the scoring function is defined.

Returns:

scoredict : ~lingpy.algorithm.misc.ScoreDict

A ScoreDict instance which can be directly passed to LingPy’s alignment functions.

lingpy.read.qlc module

lingpy.read.qlc.normalize_alignment(alignment)

Function normalizes an alignment.

Normalization here means that columns consisting only of gaps will be deleted, and all sequences will be stretched to equal length by adding additional gap characters in the end of smaller sequences.

lingpy.read.qlc.read_msa(infile, comment='#', ids=False, header=True, normalize=True, **keywords)

Simple function to load an MSA object.

Parameters:

infile : str

The name of the input file.

comment : str (default=”#”)

The comment character. If a line starts with this character, it will be ignored.

ids : bool (default=False)

Indicate whether the MSA file contains unique IDs for all sequences or not.

Returns:

d : dict

A dictionary in which keys correspond to specific parts of a multiple alignment. This dictionary can be directly passed to alignment functions, such as lingpy.sca.MSA.

lingpy.read.qlc.read_qlc(infile, comment='#')

Simple function that loads qlc-format into a dictionary.

Parameters:

infile : str

The name of the input file.

comment : str (default=”#”)

The comment character. If a line starts with this character, it will be ignored.

Returns:

d : dict

A dictionary with integer keys corresponding to the order of the lines of the input file. The header is given 0 as a specific key.

lingpy.read.qlc.reduce_alignment(alignment)

Function reduces a given alignment.

Notes

Reduction here means that the output alignment consists only of those parts which have not been marked to be ignored by the user (parts in brackets). It requires that all data is properly coded. If reduction fails, this will throw a warning, and all brackets are simply removed in the output alignment.

lingpy.read.starling module

Basic parser for Starling data.

lingpy.read.starling.star2qlc(filename, clean_taxnames=False, debug=False)

Converts a file directly output from starling to LingPy-QLC format.

Module contents