Datasets

Re: Datasets

Postby cwyoo » Sat Feb 07, 2015 5:50 am

meninonas wrote:Dr. Yoo,

Please find the two datasets below. I added the JPEG file for the first run for the Cell Cycle. I'm already started running the second Banjo analysis for Cell Cycle.


Variable names should be shown. I only see variable's number ids.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Mon Feb 09, 2015 3:33 pm

Dr. Yoo,

Please find the Cell Cycle dataset below with the first two Banjo analyses.
Attachments
Cell Cycle.csv
(231.56 KiB) Downloaded 134 times
top.graph.2015.02.09.14.47.14.jpg
top.graph.2015.02.09.14.47.14.jpg (1.93 MiB) Viewed 1868 times
top.graph.2015.02.09.13.35.49.jpg
top.graph.2015.02.09.13.35.49.jpg (1.65 MiB) Viewed 1868 times
top.graph.2015.02.09.12.26.47.jpg
top.graph.2015.02.09.12.26.47.jpg (2.22 MiB) Viewed 1868 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Tue Feb 10, 2015 11:59 am

This is TGCA 2012 Breast Cancer dataset all merged.
Attachments
TGCA_2012_all_merged.csv
(3.61 MiB) Downloaded 134 times
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Tue Feb 10, 2015 1:54 pm

Dr. Yoo,

I have coded the datasets according to Mutation, Amplification, and Expression. I was not able to merge them in one large dataset due to LibreOffice Calc not being able to open the original dataset in its entirety. LibreOffice Calc gives me the following error:

Code: Select all
The data could not be loaded completely because the maximum number of columns per sheet was exceeded.


I would suggest opening the datasets in Microsoft Office if you have it available to you.

I am currently working on labeling the genes according to _MUT, _AMP, and _EXP. Those should be ready tomorrow.
Attachments
Mutated Data.csv
(1.9 MiB) Downloaded 140 times
Amplified Data.csv
(1.9 MiB) Downloaded 122 times
Expressed Data.csv
(1.9 MiB) Downloaded 133 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby meninonas » Wed Feb 11, 2015 6:27 pm

Dr. Yoo,

I ran the dataset in banjo and it gave me the following error:

Code: Select all
[ERROR: Banjo 2.2.0, 2/11/15 5:06:39 PM]
Execution has stopped because Banjo has run out of available memory.
'java.lang.OutOfMemoryError: Java heap space'


I ran banjo in 131.94.9.17 and it gave me the same error. When I tried accessing the other servers but I wasn't able to.
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Wed Feb 11, 2015 10:03 pm

meninonas wrote:Dr. Yoo,

I ran the dataset in banjo and it gave me the following error:

Code: Select all
[ERROR: Banjo 2.2.0, 2/11/15 5:06:39 PM]
Execution has stopped because Banjo has run out of available memory.
'java.lang.OutOfMemoryError: Java heap space'


I ran banjo in 131.94.9.17 and it gave me the same error. When I tried accessing the other servers but I wasn't able to.


I have told you to match the id before merging the clinical data with expression data. You have used the clinical data that is sorted by "Subtype".
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Thu Feb 12, 2015 1:40 am

Dr. Yoo,

Please find the Merged dataset below. Regardless, the size of the file remains the same and the number of columns went to 2841, which is an increase of about 700 columns. Consequently, the issue remains the same, both datasets are too large for the memory capacity.
Attachments
mergeddata.csv
(4.7 MiB) Downloaded 135 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Thu Feb 12, 2015 5:40 am

meninonas wrote:Dr. Yoo,

Please find the Merged dataset below. Regardless, the size of the file remains the same and the number of columns went to 2841, which is an increase of about 700 columns. Consequently, the issue remains the same, both datasets are too large for the memory capacity.


What is the source of the error? Please post a file without any errors. What is the dimension of the original data that I post? After removing clinical variables and gene variables (state the reason why you deleted the variables), what is the dimension?
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

Re: Datasets

Postby meninonas » Fri Feb 13, 2015 3:00 pm

cwyoo wrote:
What is the source of the error? Please post a file without any errors. What is the dimension of the original data that I post? After removing clinical variables and gene variables (state the reason why you deleted the variables), what is the dimension?


Dr. Yoo,

I tried calling you through Skype but you didn't answer.

I'm a little confused about the questions you asked in the post above. The error that banjo gave me is that the computer doesn't have enough memory to deal with the size of the dataset. I have re-added the error below:

Code: Select all
[ERROR: Banjo 2.2.0, 2/13/15 1:29:11 PM]
Execution has stopped because Banjo has run out of available memory.
'java.lang.OutOfMemoryError: Java heap space'


Consequently, I'm lead to believe that this computer does not have the capacity to deal with datasets that contain 2000+ columns.

When you say to post a file without errors, what are you referring to? To my knowledge, none of the datasets have errors in them.

The original data that you posted (TGCA_2012_all_merged.csv) has dimensions of 463x2162.

The reason why the Merged, Amplified, and Mutated Datasets do not contain the clinical variables is because we had a greed to first code the genes and then we would choose which clinical variables were going to be coded. I'm assuming the only clinical variables you're interested in having are the highlighted ones you sent me through email. The dataset you sent me through email is attached below. None of the genes were deleted.
Attachments
Breast Invasive Carcinoma_TCGA Nature 2012_468 meged cases.xlsx
(3.4 MiB) Downloaded 127 times
TGCA_2012_all_merged.csv
(3.61 MiB) Downloaded 135 times
meninonas
 
Posts: 137
Joined: Tue Jun 24, 2014 3:25 pm

Re: Datasets

Postby cwyoo » Fri Feb 13, 2015 3:53 pm

meninonas wrote:
cwyoo wrote:
What is the source of the error? Please post a file without any errors. What is the dimension of the original data that I post? After removing clinical variables and gene variables (state the reason why you deleted the variables), what is the dimension?


Dr. Yoo,

I tried calling you through Skype but you didn't answer.

I'm a little confused about the questions you asked in the post above. The error that banjo gave me is that the computer doesn't have enough memory to deal with the size of the dataset. I have re-added the error below:

Code: Select all
[ERROR: Banjo 2.2.0, 2/13/15 1:29:11 PM]
Execution has stopped because Banjo has run out of available memory.
'java.lang.OutOfMemoryError: Java heap space'


Consequently, I'm lead to believe that this computer does not have the capacity to deal with datasets that contain 2000+ columns.

When you say to post a file without errors, what are you referring to? To my knowledge, none of the datasets have errors in them.

The original data that you posted (TGCA_2012_all_merged.csv) has dimensions of 463x2162.

The reason why the Merged, Amplified, and Mutated Datasets do not contain the clinical variables is because we had a greed to first code the genes and then we would choose which clinical variables were going to be coded. I'm assuming the only clinical variables you're interested in having are the highlighted ones you sent me through email. The dataset you sent me through email is attached below. None of the genes were deleted.


If we started with 463×2162 and added several clinical variables, how can you end up with 2800+ columns?

Use expression data and include only limited number of genes that are highly correlated with NRF1. Start from 2,000 genes and check whether banjo runs in path-four, if not, each time remove 500 genes that are less correlated with NRF1 and check whether banjo runs.
cwyoo
Site Admin
 
Posts: 379
Joined: Sun Jun 22, 2014 2:38 pm

PreviousNext

Return to Breast Cancer

Who is online

Users browsing this forum: No registered users and 1 guest

cron