home

Where the Minor Things Are (WtMTA): (Yet Another) Minor Intron Database

The Where the Minor Things Are (WtMTA) intron database contains information about introns in > 1500 species identified by Larue & Roy, 2023 as containing minor introns, with a total of more than 250 million rows. The data includes intron information such as type classification (major or minor), phase, genomic coordinates, etc. for all annotated introns included in our analyses, as well as additional metadata about parent genes, transcripts, and genomes.

Intron classifications were generated using intronIC, and other intron-based metadata (introns per kbps coding sequence, etc.) was obtained using custom Python workflows. All substrate data was sourced from publicly-available genomic resources such as NCBI, Ensembl and JGI.

Exploring the database

Unless you are interested in the entirety of the data (see the section on running the database locally), the best place to start exploring may be via the genomes table. There, you can select a species of interest and drill down to the associated introns and/or transcripts for further filtering.

The results of any query can be downloaded in a number of plaintext formats (e.g., CSV), provided they don’t exceed 1 GB (see Advanced Export below the paginated results; select stream all rows to ensure the full dataset is returned). This should be sufficient to retrieve, for example, the complete intron/transcript set for any individual genome, or a subset of introns/transcripts across a number of different genomes.

Searching within tables

The genomes and transcripts table provide limited search functionality, allowing for queries of complete words only (i.e., no wildcards). For example, to return information about all cnidarian genomes, the genomes table should be searched for cnidaria, but not (for example) cnidar*.

Obtaining a local copy of the DB

The SQLite database file was created using sqlite-utils and Datasette.

You are free to download the entire WtMTA database file via the link at the bottom of this page. After doing so, you can recreate most of the functionality of this website on a local computer/server.

To explore a local version of this database using Datasette, first install Datasette:

python3 -m pip install datasette

Then, run Datasette with the SQLite database file:

datasette -i WtMTA.db

This command will start a local web server (the default URL will be displayed by Datasette automatically), and you can explore the database interactively using your web browser. See Datasette’s documentation for details and additional options.

Data license: ODbL · Data source: Larue & Roy, 2023

Custom SQL query

Queries

  • BED-formatted introns

Tables

genomes

taxonomy_id, species, family, order, phylum, accession, n_minor_introns, n_major_introns, percent_minor_introns, busco_score, minor_snRNAs, genome_version, source_url, source_metadata, minor_intron+

1,575 rows

introns

id, dinucleotide_pair, is_minor, score, length, transcript_id, ordinal_index, start, end, taxonomy_id, scored_motifs, phase, in_cds, relative_position

214,855,132 rows

transcripts

id, taxonomy_id, transcript_id, gene_id, chromosome, strand, start, end, coding_length, introns_per_kbp_cds, proportion_minor_introns, n_introns, n_minor_introns

35,702,876 rows

Download SQLite DB: WtMTA.db 37.8 GB

Powered by Datasette · Queries took 10.506ms · Data license: ODbL · Data source: Larue & Roy, 2023